support for CUDA and NVDEC (NVIDIA Decode) as an alternative to VDPAU

ahjolinna commented 8 years ago

it seems MPV added this to theirs, I wondering if you could also add the NVIDIA decode support using CUDA, it was added to make up for VDPAU's current lack of HEVC Main 10 profile support. https://developer.nvidia.com/nvidia-video-codec-sdk

..unfortunately to install the nvidia-video-codec-sdk you need to have/make a nvidia dev account to even download the damn pkgs -_- ...and I think you also need CUDA 8 for it to work

more info about it from MPV's point of view, only thing that is false in there that it does support 10 bit

zaps166 commented 8 years ago

FFmpeg already supports it, so it shouldn't be so difficult :smile:

zaps166 commented 8 years ago

Btw. maybe this will be sufficient (instead of using full SDK): https://github.com/FFmpeg/FFmpeg/tree/master/compat/cuda

I don't know when I'll implement it (it depends...). Also DXVA2 and VDPAU/VAAPI improvements are in TODO :smile:

ahjolinna commented 8 years ago

it's possible that vdpau is 100% dead now, at least from Nvidia's behalf. I just wonder how this new replacement will work on older hardware and how is amd/radeon support? (I personally couldn't care less about AMD)...if it will work just fine at least in theory, then I would prepare to drop VDPAU support all together when CUDA 8 has become more mainstream on distros.

I really need to check how this new system works compare to vdpau and what it requires (hardware & software)... it would be nice if everyone (amd & intel) could use this, then at least intel could drop their awful VAAPI...and AMD well who cares :D

zaps166 commented 8 years ago

VDPAU and VAAPI is only an API designed for video hardware decoding (or encoding: VAAPI). They have MIT license. The implementation can be different, depends on hardware - it can use CPU (if someone will write driver for it), GPU or hardware video decoder (e.g. PureVideo, UVD or e.g. Allwinner CedarX). Notice that nvidia control panel on Linux has "GPU Utilization" and "Video Engine Utilization". Second is designed for video (IIRC also for NVENC).

Playing BBB (bbb_sunflower_native_60fps_normal.mp4, 4000x2250) via VDPAU uses "Video Engine Utilization" at ~60% and GPU at 0% when QMPlay2 is minimized. When it displays a video it consumes also GPU at ~10%, so GPU is not used for decoding a video, but rather for filtering/scaling displaying - it is a good feature :)

I don't know CUVID, I must compile FFmpeg with this and try :smile:, but if it uses CUDA, it probably will use GPU like games (GLSL) or OpenCL (I don't have any experience in CUDA), so I assume that it can consume more power. Also I don't know if their license allows to redistribute software binaries with CUVID support... Maybe open a libnvcuvid.so and resolve any symbol dynamically :smile: can be OK? But FFmpeg won't be redistributed with CUVID support, so user must manually compile everything...

QMPlay2 currently doesn't fully support 10-bit videos - it converts to 8-bit (I don't know how to test it, I don't have hardware (monitor?), etc., and I know that Linux/X11 has problems with 10-bit images), but QMPlay2 supports YUV444. AFAIK using 10-bit in codec can achieve better quality if it is converted to 8-bit before displaying it.

ahjolinna commented 8 years ago

@zaps166 : have you seen the Nvidia's XDC2016 presentation from Andy Ritger - about Linux and High Dynamic Range Display?

zaps166 commented 8 years ago

@ahjolinna Thanks, I'll look at the presentation later :)

but if it uses CUDA, it probably will use GPU like games (GLSL) or OpenCL (I don't have any experience in CUDA), so I assume that it can consume more power.

CUVID also uses "Video Engine" whatever it is (ASIC or maybe GPU) :smile:

I think that using FFmpeg for CUVID is not a good idea, because it links to libcuda and libnvcuvid libraries...

zaps166 commented 8 years ago

@ahjolinna CUVID module is in cuvid temporary branch :)

ahjolinna commented 8 years ago

nice, that was fast...I will try it out

zaps166 commented 8 years ago

@ahjolinna CUVID module is not finished, it works too slow due to copying data from GPU RAM to system RAM and again to GPU RAM in OpenGL, but it already works on Linux and Windows (including Windows XP 32-bit and GF 8400GS :smile:).

ahjolinna commented 8 years ago

well I won't notice that slow part so easily with my "super overclocked" version of GTX970 and (only)16gb of DDR4 ram.

anyway its looks little bit stupid when "settings->module->cuvid" has only enable option, many modules has this "problem" (more or less). It would be nice if you could combine some as "main module" and have "sub-modules", oc this is not a priority right now..maybe later

zaps166 commented 8 years ago

@ahjolinna

well I won't notice that slow part so easily with my "super overclocked" version of GTX970 and (only)16gb of DDR4 ram.

Try bbb 4000x2250 and compare VDPAU & current CUVID implementation :)

anyway its looks little bit stupid when "settings->module->cuvid" has only enable option

I can add possibility to choose which GPU will be used.

It would be nice if you could combine some as "main module" and have "sub-modules"

I don't understand... :smile:

ahjolinna commented 8 years ago

I tried that 'bbb 4000x2250' video clip, with CUVID and it worked just fine on my hardware, even when I do A LOT shit in the background...I have like firefox open with 500+ tabs, and Orion twitch client and discord and franz steam, mpv...and I'm also doing btrfs balance on one of my 3tb HDD's (haven't done it in ½ year)...and comparing to vdpau I didnt notice any difference in performance, maybe little bit worse quality in vdpau.

PS. if you are intrested I have ffmpeg 3.1.x compiled against the cuda 8 and the sdk)

anyway, about the module things in settings, I mean that you could make it more compact. for example instead of having 'Alsa' and 'input' and 'pulseaudio' etc. there would a Audio as the "main-module" instead and inside that would be those 'Alsa' and 'input' and 'pulseaudio' etc. as a the "sub-module", how you will make/implement it is an other thing

zaps166 commented 8 years ago

CUVID - done (multi-platform, not linked to CUDA/CUVID librarirs), please test :)

and comparing to vdpau I didnt notice any difference in performance, maybe little bit worse quality in vdpau.

Now it should be the same performance as VDPAU. Old behavior (with first temporary commit, data copy from GPU to CPU memory) is available in CUVID settings.

anyway, about the module things in settings, I mean that you could make it more compact.

Yes, it will be better, but maybe later :)

ahjolinna commented 8 years ago

I just remembered/thought about; is there any need to update the readme file about CUVID support? for example about the requirements/recommendation.

I still don't know what is the oldest Nvidia hardware (drivers?) it works with...and does it work with intel and/or radeon? ...oh and what about nouveau drivers? as I'm not a "multimedia expert" but I do help to maintain ChakraOS, it would be nice to know stuff like this. Nvidia has been really unclear about this...in everyway.

btw. did I understand correctly that you have an ancient GF 8400GS* ?! and it worked just fine on it ?! if so how well compare to vdpau?

*hmm just remembered that one of my first self-build PC had a GF 8600...damn that was a logtime...nostalgia

zaps166 commented 8 years ago

According to http://docs.nvidia.com/cuda/video-decoder/, the lowest driver version for Linux is 260.xx. The oldest video cards are GF 8xxx series - AFAIK these are first graphics cards with CUDA and H264 decoders.

CUVID in QMPlay2 16.11.01 uses OpenGL2 for video output, VDPAU does not (I want to do it in future). CUVID works (tested by me) on Windows (XP, 7, 10) and Linux with NVIDIA proprietary drivers.

Limitations/problems:

10-bit HEVC - 2 bits lost (CUVID and QMPlay2 limitation),
CUVID plays only yuv420p,
if e.g. video size changes during playback the CUVID will show garbage (stop/play is required, FFmpeg CPU decoder can handle this) - tested on nginx rtmp stream.
HW decoding can be slower than CPU on very old and slow GPUs (e.g. H264 60FPS 1080p on GF8400GS is slower than Core2Duo E8400, video is a slideshow (the same on WindowsXP using CUVID); VDPAU in QMPlay2 performs worse in this case),
Linux CUDA bug - put the computer to sleep when CUDA application is open causes that CUDA won't work anymore. The workaround is to restart computer or close any CUDA application (e.g. QMPlay2) and run: sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm and start QMPlay2 again.

I'll test GF 8600GT later :D

zaps166 commented 8 years ago

Also for HEVC(H265) and AVC(H264) FFmpeg 3.1.x o higher is required for compilation, because QMPlay2 uses new bsf API.

ahjolinna commented 8 years ago

so vdpau does work for now in some areas little bit better... I hope 10-bit and 444p support arrives soon. good news is that its cross-platform ...so there seem to be better chance for better progress than vdpau had. I just wonder if it will work on radeon gpu/drivers also, if not what will they do now? because they have been using vdpau....the best thing would be if even intel could use CUVID then there would be just 1 API for linux...and we could get rid of the horrible VAAPI.

btw. is there any "status/stats info" display thing to see some basic info about the video/audio and it would be nice if it would also show what decoder it uses...and if there is any dropped/delayed frames..for example I have something like that for MPV: and bomi-player's:

zaps166 commented 8 years ago

shiki

Screen shows CUVID HEVC 10-bit.

Basic info is displayed in "Information" widget, it lacks dropped/repeated frames information, but you can see the decoder, video output, audio output, format, codec, video size, fps, etc.

10-bit is not fully supported by CUVID (HEVC 10-bit works, but displays as 8-bit - IMHO it has nevertheless better quality than encoding only in 8-bit), also YUV444 and YUV422 doesn't work at all. I've disabled them, because it shows only garbage and also it is impossible to show it properly due to NV12 output (it is YUV420, but with two planes - different chroma pixel layout in memory).

Also using OSD for information is a good idea, I'll think about it :D

ahjolinna commented 8 years ago

well the 10-12bit/HDR support needs to be support by the driver to work 100% (and xorg/wayland etc.)...and Nvidia is "working on it"...I don't know if you checked the Nvidia's XDC2016 presentation from Andy Ritger - about Linux and High Dynamic Range Display?...and for now GTX 950/960 and the GTX 10x0 series support 10bit (HEVC) with vdpau...for some reason GTX 970-980(ti) didn't get the support...so it at least good to know that with CUVID it works ..somewhat

ahjolinna commented 8 years ago

I still wonder what AMD will use, because they have been relying solely on vdpau and nvidia who was the only one that developed on vdpau. So now when Nvidia has abandoned it, AMD is fucked. They have to make somekind of decisions fast...will they continue with vdpau or move to CUVID (if possible) or they have to make their own new one....oc there is VAAPI...but oh god no...I rather hope anything else.

ahjolinna commented 8 years ago

@zaps166 : btw. what is the deal with Video Codec SDK (v7.0)? .."The SDK consists of two hardware acceleration interfaces: NVENCODE API for video encode acceleration and NVDECODE API (formerly called NVCUVID API) for video decode acceleration."

I had it installed for nvenc support for OBS-studio

zaps166 commented 8 years ago

Hmm, I assume that NVENC (encoder) SDK is in "Video Codec SDK (v7.0)" (nvEncodeAPI.h), but CUVIDDEC SDK is already in CUDA-8.0 SDK. Both: NVENC and CUVIDDEC SDKs are in FFmpeg repository, so it should be enough for compiling this (but I don't know OBS-studio) :) The only missing thing might be cuda.h or cudaGL.h.

Another CUVID/VDPAU limitation: doesn't work on NVIDIA Optimus - only Intel hw accell is available (I assume that NVIDIA GPU doesn't have PureVideo, because hwaccell is done via Intel).

ahjolinna commented 7 years ago

so CUDA 8 is only needed (with ffmpeg3.x), thats nice...oc for me if I want NVENC I still need that SDK. It's sad to hear about the optimus support, I wonder if that could change someday...do you know if its also affects NVENC-encoder?

anyway, about OBS, you should check it out its a great crossplatform "screenrecorder" / streaming app writen in Qt5, and NVENC is need to get nvidia's (hardware accelerated) "shadowplay" like feature

PS. do you also know libvdpau-va-gl project? and what do you think about it? would it be possible that intel users could use it instead (at least in some areas)

zaps166 commented 7 years ago

CUVID on both: Windows and Linux is unavailable on my NVIDIA Optimus laptop - probably missing chip (PureVideo?) - CUVID returns that there is no device, but CUDA for GPGPU is available on both OSes. For video encode/decode use Intel (on Linux via VA-API; FFmpeg 3.1 can encode via VA-API). NVENC - I didn't test, but probably also unavailable.

I'll look at OBS, currently If I want to stream something I use FFmpeg with NVENC from command line :D

libvdpau-va-gl

I never used this, but I can see it in repositories. Currently QMPlay2 (master branch) can use VA-API (new feature in upcoming release), CUVID and DXVA2 (new video decoder in upcoming release) via OpenGL. I want to use OpenGL also for VDPAU.

zaps166 commented 7 years ago

Maybe PureVideo availability depends on notebook, I also have other old Asus netbook with Intel and NVIDIA GPUs which can decode video using NVIDIA.

ahjolinna commented 7 years ago

yeah I know this is a old/closed topic...I dont know where to put small releated stuff/question*...anyway:

I think the purevideo availablity does depends on manufacturers...anyway it seems OBS has pullrequest to "Use dedicated GPU on Hybrid GPU systems" ..oc this is just a UI and for OBS encoding...but at least this could mean that this would possible to do in some extent at least.

*Do u use IRC or something like Discord? where I could ask/tell few (small) things..instead of spaming email or github.

zaps166 commented 7 years ago

*Do u use IRC or something like Discord? where I could ask/tell few (small) things..instead of spaming email or github.

No :smile: I use other IM's.

Hmm, so is it a variable which tells the driver whether to enable nVidia GPU just like choosing it from Windows context menu ?

ahjolinna commented 7 years ago

oc you don't use IRC/discord..sigh...what do u use? for converstation like this I would like this would use something else.

about your question, I haven't checked it thoroughly yet, so I dont know... there is also an option to ask the dev who made the pullrequest and/or ask the OBS chat/IRC

btw. why I use discord, its crossplatform even mobile and: its really popular by gamers ..well it was designed for them for group calls/chat..now it has expaneded to other communitys as well even companies

zaps166 commented 7 years ago

Hmm, Discord isn't available for Linux (I like native C/C++ apps, especially QtWidgets). For WebRTC sometimes I use http://opentokrtc.com/ (simple and works properly, no accound needed). Also I use Telegram, GG (mainly Polish IM), Linphone, qTox, E-Mail, TeamViewer, VNC, etc... I don't have experience in IRC. I'm not a pro gamer, sometimes I play one or two simple games.

Now this is off-topic, so let's switch to e.g. mail or anything else :)

Edit: I'll try Discord, I didn't know about this before :smile:

ahjolinna commented 7 years ago

Discord has a offical linux client, its in alpha stage...soon beta ..

there is also an unoffical qt version

edit: here is 2 sites to find discord servers: https://discord.me/ & https://www.discordservers.com/

there is also better discord project that gives more features, like custom themes, the linux version is still in progress

ahjolinna commented 7 years ago

my DiscordTag "name" is #4563 (and always the same nick)

zaps166 / QMPlay2

support for CUDA and NVDEC (NVIDIA Decode) as an alternative to VDPAU #60