keylase / nvidia-patch

This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.
3.5k stars 274 forks source link

[Question] Support for Windows #9

Closed justinglock40 closed 5 years ago

justinglock40 commented 6 years ago

Any chance this patch can make its way to Windows? If so that would be greatly appreciated and well received. Primarily, plex doesn't allow decode on HW in LInux only encodes, but on windows you can encode/decode on HW with plex. So unlocked transcodes there would be awesome!

jaylex32 commented 5 years ago

yep is not working properly with plex I get the same glitch with dashboard but in reality is not using the graphic card for the third one

Svendsen18 commented 5 years ago

i THINK mine works, im sitting with 10 emby streams open transcodeing down from 60/10 mbps down to 2mbps and i dont think i could do this before with my i7-4790k alone https://imgur.com/a/XlJ8CJh

Snawoot commented 5 years ago

@niXta1 dxva2 is a generic hw decoder. NVENC is only option for hw encoding with Nvidia on Windows. It is specified in ffmpeg command line options by codec option, not -hwaccel (that's counterintuitive a little bit).

Snawoot commented 5 years ago

@Svendsen18 Maybe it is running encoding on Intel QSV?

Svendsen18 commented 5 years ago

Dont think so, to my knowledge the GPU inside the CPU shuts down when you plug your monitor into the GPU, and it says it is my gtx 1070 that is under load - and Nividia NVENC is enabled under my emby settings

Svendsen18 commented 5 years ago

all this is seen on task manager btw, try to look at that link with the picture i wrote in my ealier replay it seems like my CPU is somewhat "chilling" compared to the GPU, think the cpu is handling the subtiles while the GPU handles the transcodeing

niXta1 commented 5 years ago

@Svendsen18 Yes, that should work, it's using nvdec + nvenc. Plex uses an old build of ffmpeg and does not support nvdec, that's why they use dxva + mf.

@Snawoot yes, dxva2 decode and mf encode. It says so in the debug log and Plex themselves: https://support.plex.tv/articles/115002178853-using-hardware-accelerated-streaming/

niXta1 commented 5 years ago

@Snawoot do you know the cmd line option for encoding with mf?

Snawoot commented 5 years ago

@niXta1

Plex uses an old build of ffmpeg and does not support nvdec, that's why they use dxva + mf.

I guess noone uses NVDEC interface due to single decoding session limit even for top-level GPUs. So it's likely most decoders use dxva2/CUVID for Nvidia hw decoding. Screenshot of task manager posted by Svendsen18 clearly shows OS is aware of process using decoding API. I bet emby uses native Windows decoding facilities like dxva2 too, that's why such activity is clearly recognized.

do you know the cmd line option for encoding with mf?

Does MF stands for Media Foundation? It seems it is not supported by ffmpeg.

Snawoot commented 5 years ago

@niXta1 sorry, I probably confused number of chips with sessions limit.

niXta1 commented 5 years ago

Nope, plex doesn’t use nvdec because it’s ffmpeg is too old, it’s not supported. They can’t. They are working on an update to support it, and when they do, we will get better pq in windows (nvdec + nvenc gives better pq) and we will get hw decoding (nvdec) in Linux (nvenc is already there). I believe the same stream limitations are enforced no matter of encoder.

Im not sure how Plex does it, but yes, they use dxva2 + mf. Maybe some special sauce? It says so on their support site and they have confirmed it in their forums.

Svendsen18 commented 5 years ago

An update: Tried this with a mate of mine, his spec: i5-8500 / gtx1080ti

First thing we tried was without the GPU in the system All videos was from 720p/10mpbs down to 480p/0.7kb something something ( cause his upload is shit) Pure CPU: 7 active streams, choked on nr8 CPU/Igpu: 7 active streams, choked on nr8 CPU/ 1080TI without fix: 9 active streams choked on nr10 CPU/1080TI WITH fix: 13 active streams, and choked on the 14 stream

both the CPU and GPU was on 100 % usage if taskmanager was correct, and it "felt" from my end that it was the CPU getting crushed, but it was BETTER with the fix than without

emby transcode sittings: Nividia NVENC enabled

Saentist commented 5 years ago

An update: Tried this with a mate of mine, his spec: i5-8500 / gtx1080ti

First thing we tried was without the GPU in the system All videos was from 720p/10mpbs down to 480p/0.7kb something something ( cause his upload is shit) Pure CPU: 7 active streams, choked on nr8 CPU/Igpu: 7 active streams, choked on nr8 CPU/ 1080TI without fix: 9 active streams choked on nr10 CPU/1080TI WITH fix: 13 active streams, and choked on the 14 stream

both the CPU and GPU was on 100 % usage if taskmanager was correct, and it "felt" from my end that it was the CPU getting crushed, but it was BETTER with the fix than without

emby transcode sittings: Nividia NVENC enabled

i5-4440 build in GPU HD4600 transcode without problem 12 HD streams with QSV https://github.com/rigaya/QSVEnc https://github.com/rigaya/NVEnc

niXta1 commented 5 years ago

i5-4440 build in GPU HD4600 transcode without problem 12 HD streams with QSV https://github.com/rigaya/QSVEnc https://github.com/rigaya/NVEnc

@Saentist that greatly depends on the video, codec, bitrate etc. ”12 HD streams” does not really help. With some videos, QS chokes after 2 HD transcodes on 7th gen Intel CPU.

niXta1 commented 5 years ago

An update: ...

@Svendsen18 Please redo the test with 1080p high bitrate and state the codec h264/h265. Transcode to high bitrate, and do the testing on LAN. Web browsers/tabs can be used.

You can use GPU-Z and confirm that it is using the nvidia video processor. Doesn’t emby show stats with regards of stream details too?

Saentist commented 5 years ago

i5-4440 build in GPU HD4600 transcode without problem 12 HD streams with QSV https://github.com/rigaya/QSVEnc https://github.com/rigaya/NVEnc

@Saentist that greatly depends on the video, codec, bitrate etc. ”12 HD streams” does not really help. With some videos, QS chokes after 2 HD transcodes on 7th gen Intel CPU.

HD4600 support up to H264 so 12 streams h264 1080 6-12Mbps -> 720@3Mbps h264

roderrooder commented 5 years ago

Hey guys,

I originally commented on this thread (pre Windows support) but eventually removed my comment as I didn't think it added anything to the conversation.

I also replied to a commented on a Reddit post: https://www.reddit.com/r/PleX/comments/9dim68/override_nvidas_2_stream_limitation_on_gtx_gpus/e5hs6kw

And eventually user "yarmak" replied to me pointing me back to this issue comment.

Not sure how constructive this is to the chain but this patch is working perfect for my use case, I use FFmpeg to capture 5 live video sources (2 displays, a camera, 2 audio streams synchronized with a black screen) simultaneously while keeping everything synchronized. Previously I had to encode 3 of the streams with my CPU that brought my CPU usage up quite a bit, and that was after greatly lowering the bitrate for one of the 3 recordings as to not peg my CPU. But post patch I can encode all 5 video streams with just my GTX1080 and no compromise on bitrates.

Res=Resolution FPS=Framerate VB=Video Bitrate AC=Audio Channels ASF=Audio Sample Frequency AB=Audio Bitrate

Stream 1 - Res=256x120 | VB=16K | FPS=25 | AC=2 | ASF=44100 | AB=384K Stream 2 - Res=256x120 | VB=16K | FPS=25 | AC=2 | ASF=44100 | AB=384K Stream 3 - Res=1920x1080 | VB=288M | FPS=60 | AC=2 | ASF=44100 | AB=384K Stream 4 - Res=3440x1440 | VB=288M | FPS=100 | AC=2 | ASF=44100 | AB=384K Stream 5 - Res=3840x2160 | VB=288M | FPS=60 | AC=1 | ASF=44100 | AB=192K

All handled by the GPU, only ever gets up to around 85% encoder and 4GB of VRAM usage respectively. This is pretty amazing and makes the GTX 1080 an insane value for encoding as an equivalent Quadro would be around $1200 while the prior is around $350. The GTX 1080 has 2 NVENC chips making it 2x as powerful for encoding when compared to it's smaller like series brethren.

Anyways just thought I'd chime in to let you guys know it's working perfectly for me!

wazerstar commented 5 years ago

This is looking great!, can we get an offset for 417.58?

or even better someone could create/provide a tool to find the offset automatic when pointing at nvcuvid.dll ?

https://nvidia.custhelp.com/app/answers/detail/a_id/4758/~/geforce-hotfix-driver-version-417.58

Snawoot commented 5 years ago

@wazerstar

This is looking great!, can we get an offset for 417.58?

Please open new issue to track progress on this.

or even better someone could create/provide a tool to find the offset automatic when pointing at nvcuvid.dll ?

"Option 2" is a step towards it, but driver code changes frequently, no one can be sure same byte-string will be valid for upcoming versions. Linux version of this patch uses such technique, but new versions still added manually as they confirmed to work.

niXta1 commented 5 years ago

i5-4440 build in GPU HD4600 transcode without problem 12 HD streams with QSV ...

@Saentist GTX 10xx can probably do up to ~30 of those with much better picture quality. The big benefit here is 4k and h265 transcoding.

mekk55 commented 5 years ago

I tried with the latest patch (v1.9) but when I selected "C:\Windows\System32\nvcuvid.dll" the patch return "The .1337 File is not valid" then I clicked "Patch" button the patch return "Files are no longer present".

My OS is Windowns 10 64bit and I downloaded driver from link

So I tried with second method (copy nvcuvid.dll to ubuntu and run command) I had a question that which file is patched file between original "nvcuvid.dll" and "nvcuvid.dll.bk" because file's size and modified date are the same ?

wazerstar commented 5 years ago

use notepad++ to create the text file as it makes sure its in utf-8 then copy paste the RAW data from nvcuvid

Else take this one I created nvcuvid.zip

mekk55 commented 5 years ago

use notepad++ to create the text file as it makes sure its in utf-8 then copy paste the RAW data from nvcuvid

Else take this one I created nvcuvid.zip

What's your driver version ?

roderrooder commented 5 years ago

I was also getting "The .1337 File is not valid" until I downloaded the .1337 file from earlier in this chain (For version 417.35). When I right click and "save as" the file from this page - https://github.com/keylase/nvidia-patch/tree/master/win/win10_x64/417.58 it saves off the webcode for the page or something.

Right now like wazerstar suggests I have to copy and paste the contents from here https://github.com/keylase/nvidia-patch/blob/master/win/win10_x64/417.58/nvcuvid.1337 and save it off myself as a .1337 file.

What is the proper method of downloading the file directly?

jaylex32 commented 5 years ago

https://github.com/keylase/nvidia-patch/blob/36622bf7119094233039e857b69d890cee2c7e95/win/win10_x64/417.58/nvcuvid.1337 Go to that link and right click in raw and save file

mekk55 commented 5 years ago

https://github.com/keylase/nvidia-patch/blob/36622bf7119094233039e857b69d890cee2c7e95/win/win10_x64/417.58/nvcuvid.1337 Go to that link and right click in raw and save file

Thank you for all who help created the patch. Finally I can use ffmpeg with cuda on Windows OS without session limitation. ffmpeg on Windows is use less CPU usage than Ubuntu a lot.

BushySushiPanda commented 5 years ago

Hold your horses! It's working. I thought about it and why would you test the capability every time you run? You'd test/check at first start and if no hw changes after that you'd just assume it's the same. Makes sense. So I had a look around and found a few interesting reg entries and flag files.

So after you apply the patch do the following:

  1. Uninstall Plex Media Server.
  2. Delete the "Plex Media Server" folder in %UserProfile%\AppData\Local.
  3. Delete the registry entry "Plex Media Server" in Computer\HKEY_CURRENT_USER\Software\Plex, Inc.
  4. Reboot for good measure (I didn't).
  5. Install Plex Media Server.
  6. Profit.

nvencx3

I was able to get Plex to recognize the unlimited transcodes by patching the NVIDIA driver (417.35) and downloading my same Plex version and reinstalling by doing a Repair. Afterwards, I was no longer limited to the two sessions. I did not have to uninstall Plex or lose any configuration.

niXta1 commented 5 years ago

Hold your horses! It's working. ...

@christhompson1972 sweet! can you run 4-5 transcodes and take a screendump?

BushySushiPanda commented 5 years ago

Hold your horses! It's working. ...

@christhompson1972 sweet! can you run 4-5 transcodes and take a screendump?

5

jaylex32 commented 5 years ago

Hold your horses! It's working. ...

@christhompson1972 sweet! can you run 4-5 transcodes and take a screendump?

5

You might want to check your task manager and see what is utilizing the transcoder it looks like your CPU is the one that is making the work!

niXta1 commented 5 years ago

Hold your horses! It's working. ... ...

@christhompson1972 it looks like the glitch I had too... The three first look like they are hw encoded (most likely, the 2 first are), the last two aren’t hw encoded.

BushySushiPanda commented 5 years ago

I'll test again, but CPU was at less than 20% with all 5

jaylex32 commented 5 years ago

I'll test again, but CPU was at less than 20% with all 5

Man I really want you to succeed because like that we know that now it works please let us know!

niXta1 commented 5 years ago

I'll test again, but CPU was at less than 20% with all 5 ...

@christhompson1972 yep. Low quality encoding like that doesn’t really tax the CPU much. Looks resonable. :( Please hustle plex about nvdec in the forums, there‘s a big thread about it.

jaylex32 commented 5 years ago

I'll test again, but CPU was at less than 20% with all 5 ...

@christhompson1972 yep. Low quality encoding like that doesn’t really tax the CPU much. Looks resonable. :( Please hustle plex about nvdec in the forums, there‘s a big thread about it.

You know I never found that thread!

niXta1 commented 5 years ago

I'll test again, but CPU was at less than 20% with all 5 ... ...

@jaylex32 https://forums.plex.tv/t/hardware-accelerated-decode-nvidia-for-linux/233510 Also vote.

mcrommert commented 5 years ago

So what is the status here for plex? - tried to setup with my 750ti - but no dice on any reencodes over 2. I saw the help to remove the plex folder, but in my case that wouldn't work.

jaylex32 commented 5 years ago

So what is the status here for plex? - tried to setup with my 750ti - but no dice on any reencodes over 2. I saw the help to remove the plex folder, but in my case that wouldn't work.

its still doesnt work with Plex because the issue is with Plex not this mod!! they have to update the Plex server program to be able to use the mod

jaylex32 commented 5 years ago

An update: Tried this with a mate of mine, his spec: i5-8500 / gtx1080ti

First thing we tried was without the GPU in the system All videos was from 720p/10mpbs down to 480p/0.7kb something something ( cause his upload is shit) Pure CPU: 7 active streams, choked on nr8 CPU/Igpu: 7 active streams, choked on nr8 CPU/ 1080TI without fix: 9 active streams choked on nr10 CPU/1080TI WITH fix: 13 active streams, and choked on the 14 stream

both the CPU and GPU was on 100 % usage if taskmanager was correct, and it "felt" from my end that it was the CPU getting crushed, but it was BETTER with the fix than without

emby transcode sittings: Nividia NVENC enabled

What version of Emby you using and how did you set it up because I try it but is using my CPU a lot with just 3 transcodes

imayo commented 5 years ago

I tried this for 418.81 on Windows 10 64 bits and is not working. Our software uses NVENC and after testing I was not able to run more than 2 NVENC instances. My concers are in why is nvcuvid.dll being patched, as far as I know this dll was the old one with CUDA enc/dec implementations. New NVENC encoder implementation seems to not rely on nvcuvid.dll, it is not even loaded into our server process, in contrast when using NVENC the one loaded is:

NVIDIA Video Encoder API, Version 8.0 C:\Windows\System32\nvEncodeAPI64.dll

Could this patch be made for this DLL?

niXta1 commented 5 years ago

I tried this for 418.81 on Windows 10 64 bits and is not working. ...

@imayo This thread is regarding the old version. Here is my post regarding the new version: https://github.com/keylase/nvidia-patch/issues/51

imayo commented 5 years ago

Ops, thanks :)

Snawoot commented 5 years ago

@imayo Hi!

nvEncodeAPI trace leads to nvcuvid.dll which is used by nvEncodeAPI64.dll even for NVENC

Please make sure that:

  1. You applied patch successfully
  2. Both OS and APP is 64bit

Also try to test with FFmpeg.

jaylex32 commented 5 years ago

@imayo Hi!

nvEncodeAPI trace leads to nvcuvid.dll which is used by nvEncodeAPI64.dll even for NVENC

Please make sure that:

  1. You applied patch successfully
  2. Both OS and APP is 64bit

Also try to test with FFmpeg.

Does the new version is going to be applied the same way like old version?

jaylex32 commented 5 years ago

.

I tried this for 418.81 on Windows 10 64 bits and is not working. ...

@imayo This thread is regarding the old version. Here is my post regarding the new version: #51

What's the difference between this new version and the old version?

niXta1 commented 5 years ago

. ...

@jaylex32 the old version was v1.14 of PMS, using an older version of ffmpeg not compatible with nvdec. The patch hasn’t changed. It’s been working all along. Plex changed, they updated ffmpeg (at least for Linux, I guess for Windows too). I just got an hunch they still use dxva2+mf, and that’s why it doesn’t work. Take a look at my reply in 51.

imayo commented 5 years ago

@imayo Hi!

nvEncodeAPI trace leads to nvcuvid.dll which is used by nvEncodeAPI64.dll even for NVENC

Please make sure that:

  1. You applied patch successfully
  2. Both OS and APP is 64bit

Also try to test with FFmpeg.

But, our software does use NVENC, initializes everything and streams properly but i can´t see nvcuvid.dll, it usese nvEncodeAPI and this dll is loaded into the process. I mean, i was just curious to see if this patch could be used for other softwares. We have developed a game streaming software and if you think it helps you we can help as I have used in the past both NVCUVID and NVENC.

niXta1 commented 5 years ago

@imayo Hi! ...

@imayo Oh! I’m sorry! You’re not talking about Plex! You should probably open a new issue and state your tools and arguments. Sorry about the misunderstanding.

jaylex32 commented 5 years ago

. ...

@jaylex32 the old version was v1.14 of PMS, using an older version of ffmpeg not compatible with nvdec. The patch hasn’t changed. It’s been working all along. Plex changed, they updated ffmpeg (at least for Linux, I guess for Windows too). I just got an hunch they still use dxva2+mf, and that’s why it doesn’t work. Take a look at my reply in 51.

About to say because is working like a charm in Emby thanks!

keplenk commented 5 years ago

Hi!

I just bought a Quadro P400 hoping I can use it with unlimited transcoding with Plex.

I have Windows 10 1809 ENT - with Plex version 1.15.0.659-9311f93fd

Fresh WIndows 10 Install - patched was successful. I used the Quadro 418.81.

However, plex doesn't transcode more than 2 sessions concurrently.

What other methods can I test if the limit is actually more than 2? I saw the FFMPEG method by @Snawoot but how can I find out if its actually using the video card from when running the FFMPEG in cmd? When I tried to do it (ran 4 instance) CPU was 100% and the GPU was 55%. What should I look for in the results or during encoding?

EDIT:

So I ran the FFMPEG test and here are my results. I don't understand it. Can someone verify if its actually passing the 2 limit?

https://imgur.com/a/BCo2ZHh