psieg / Lightpack

Lightpack and Prismatik open repository
GNU General Public License v3.0
1.54k stars 183 forks source link

Add Nvidia Frame Buffer Capture (NVFBC and NVIFR) source please for hookless high performance capture in games! #235

Open v00d00m4n opened 5 years ago

v00d00m4n commented 5 years ago

Nvidia uses hardware FBC for its own streaming and recording software Expirience. Steam also uses it for streaming to steam link.

They work independant from any API uses in game or sofware. FBC effective for full screen framebuffer capture and IFR for windowed capture, her some details:

NVFBC

Captures the framebuffer (front buffer) without any involvement from OpenGL or Direct3D.

Effectively a direct copy of the framebuffer irrespective of which application(s) drew it.

It generally only works sensibly in fullscreen mode. If you render in windowed mode and use NVFBC, it is going to capture the entire screen including your desktop and other unrelated windows.

NVIFR

Slightly more complicated and less performant than NVFBC, this can capture a single application.

In my experience this used to be how Steam would stream windowed-mode applications. I have not seen this capture path in a very long time and I am glad because performance was awful whenever it was used.

Here some documentation https://developer.nvidia.com/sites/default/files/akamai/designworks/docs/NVIDIA%20Capture%20SDK%20Programming%20Guide.pdf

http://on-demand.gputechconf.com/gtc/2016/presentation/s6307-shounak-deshpande-get-to-know-the-nvidia-grid-sdk.pdf

Here the capture SDK itself: https://developer.nvidia.com/capture-sdk

ATI also has something similar, but im not very familiar with it, maybe google it.

Anyway, implementation of this will reduce CPU load and performance impact completely, please do this as soon as you can.

v00d00m4n commented 5 years ago

Anyone?

zomfg commented 5 years ago

Try this: get the last build of Prismatik, set it to whatever framerate you realistically want to use with your LEDs (or something like 30fps to make it simple) run TimeSpy for example

quit Prismatik, setup Shadowplay with same framerate and a low quality preset (I don't know if all this is possible), start capture run same benchmark

and a run of benchmark alone

report results

v00d00m4n commented 5 years ago

Well its not much about GPU but about CPU unloading, which not always affects games, since not all of them CPU performance sensitive, and its also about compatibility and hardware accelerated capture. NVFBC allows to capture any fullscreen app no matter what API it uses, since capture is taken directly from videocard frame buffer so no hooks and CPU processing required at all. In similar fashion NVIFR allows to capture desktop and anything windowed, even some windows that does not work wind desktop dublication, you can also capture game which uses unsupport api like Vulkan, OpenGL or some old DirectDraw games or even some UWP apps with video stream (try IVI uwp from windows store, it video stream does not get capture by prizmatic, only UI affects it). And most important - its very low latency. Basically its low level direct hardware access to frame buffer, you may capture even some mad skills asm written demos that does not use dx or ogl or vk and instead pure low level with direct GPU access.

Here is example of open source software that uses NVFBC https://github.com/gnif/LookingGlass/tree/master/host/Capture just examine this code and repeat it, and download SDK for headers, and thats it. (as a bonus this one also has nice DXGI capture code, implement it to prismatic as well would a nice alternative to find sweet spot with good performance and compatibility).

Another even better example of usage of NVFBC in open source software is here https://github.com/bloodelves88/CloudyNvCapture

Please, take a look.

zomfg commented 5 years ago

I'm not arguing, I just wanted to have a rough idea of performance difference which I can not test myself (and neither other benefits since I'm on the red team right now). Also, unless we are talking crazy capture framerates (and even then..), Prismatik's CPU load comes mainly from averaging the colors of widgets, especially with full sized frames (or more accurately, depending on the resolution and widget size). To help with this, the last release includes downscaling for ddupl (width/8 and height/8) which is made by the GPU. So unless color averaging is ported to CUDA or equivalent, I doubt we'll see a significant difference in performance (if any, I don't have any experience with GPGPU, I'm just speculating). But even then, it'll mainly benefit (IF it benefits at all) the old capture methods that only get full sized frames. I'm saying this just to lower your expectations in the performance department, but I'm still curious to see some numbers, so if you get a chance to bench...

psieg commented 5 years ago

I also don't expect too much of an improvement. Desktop Duplication is also largely API independent (no injection) and hardware optimized. It no doubt has better compatibility than anything Nvidia specific. Anyone interested is welcome to give it a shot, but since my time is limited these days, don't get your hopes up for me doing it.

sblantipodi commented 5 years ago

@psieg now that Desktop Duplication is broken (it disables variable refresh rate) it could worth a try. is it a lot of work to add NVFBC and NVIFR support?

Benik3 commented 5 years ago

Is it nVidia exclusive or it will work also on AMD?

sblantipodi commented 5 years ago

it's nvidia but amd has its own

Benik3 commented 5 years ago

AMD has ReLive and I didn't find any API. So it mean add to Prismatik two new captures - one for nVidia and second for AMD... But how e.g. twitch connects to these streaming? Isn't possible use this way to get the picture, so it would be universal for AMD or nVidia? I don't know much about this, but maybe it will help someone to find a way :)

zomfg commented 5 years ago

@psieg now that Desktop Duplication is broken (it disables variable refresh rate) it could worth a try. is it a lot of work to add NVFBC and NVIFR support?

Find something that uses one of those and compare to ddupl

v00d00m4n commented 5 years ago

I also don't expect too much of an improvement. Desktop Duplication is also largely API independent (no injection) and hardware optimized. It no doubt has better compatibility than anything Nvidia specific. Anyone interested is welcome to give it a shot, but since my time is limited these days, don't get your hopes up for me doing it.

Improvement is huge - DD is stil a software capture that requires translation of many api calls into other api calls and driver calls and it adds noticable latency, and uses CPU resources, ITS HIGH LEVEL. Yet again NVFBC and NVIRC are PURE HARDWARE LOW LEVEL capture that skips all the middle men apis and talks directly to hardware, skipping a lot of unnecessary stuff.

DD cant be as fast as NVFBC because in battle SOFTWARE vs HARDWARE accelerated, anything HARDWARE ACCELERATED always wins. Same goes for HIGH vs LOW level, low level wins.

Difference in resource usage and performance is the same as difference between OBS and NVidia epxirience streaming - OBS is software capture tool and waste CPU and slows down games and you loosing like 2-10 fps depending on game. Nvidia Expirience on the other hand absolutely does not load CPU and does not waste any FPS during streams. OBS usually adds 10-15% CPU load and NVE adds 0% cpu load. And just to compare - right now with DD Prizmatik eats about same 15% of CPU and i guess implementation of NV or ATI direct hardware framebuffer capture will reduce load to less than 3-5%, which is quite a lot for heavy games.

Another reason why it needs to be done is compatibility - like i said before - DD does not always works, For example it does not capture video streams from UWP (just search for IVI in windows store and test with any video from it, it has plenty of free movies), maybes its done for DRM sake, to prevent video capture via DD to not allow anyone to record video streams from paid services, im not sure why i happens, but with NVFBC nad NVIFR i can easily record same UWP aplication and anything, even DRM protected web streams, because NVFBC, yet again, does not work with other APIS, it directly takes whatever is now in frame buffer of video card, and since Windows UI is compossed via video card, now matter how hard they try to DRM, they have to put video stream layer in overall desktop composition, so its exist there as is.

Also it works with old games without any dependency on API used to render things, anything that renders via hardware video card buffer gets capture no matter tha API it uses. Which is again not so with DD. You can find many reports on internet that DD does not capture everything. And since you dont need DX hooking - its a better both for compar and performance and even also is safer for using in anti cheat protected games and games that has some mods that already does DX hooking.

As far as i know - Steam has all 3 API implemented for streaming - DD for general purpose, NVFBC for Geforces and ATI thing for Radeons, plus it uses NVENC for hardware lagless encoding of video stream which also improves performance alot comparing to DD and software encoding steam does on stream if no NV or API option is selected.

So DD is good in general purpose as a fail safe method, while NV and ATI apis are better for gaming and even DRM protected streamed videos.

As for implementation - please read all the PDFs and source examples i linked in my posts above - its very easy, all you have to do - copy past general code "shell" you use for DD and just replace DD calls with NVFBC and ATI api calls , and do some minor tweaks. I thinks its even possible to use direct frame buffer transition to video card memory to reduce its size and sample zone colors, but im not sure about this part.

P.S - Oh btw, recent SDK may be limited to use with pro cards only, so if it would not work, just look for older version of SKD that did not had such restriction.

v00d00m4n commented 5 years ago

For AMD capture api i think you need to look at this https://gpuopen.com/gaming-product/advanced-media-framework/ but im not AMD user and dont know much about them, so im not sure.

sblantipodi commented 5 years ago

@v00d00m4n if it's simple why don't you try to do it? It seems that you are able to do it. Please do it if you can :)

psieg commented 5 years ago

Sorry to say this but DDupl works perfectly fine for most users. VFR isn't spread enough yet for this to be a problem. We're volunteer contributors working on this in our free time, we don't have the capacity to build two new capture sources for 20 users. If you really want it find and pay a developer to do it. Sorry.

Anyone is welcome to try this and send a pull request, I'll happily include it for everyone to benefit.

sblantipodi commented 5 years ago

Sure, it's completely legit, thanks anyway for the great job done since now.

In any case I think that everyone who use Prismatik for gaming today uses VRR monitors at least the vast majority. Really few gamers today does not use any technique of Variable Refresh Rate simply because VRR gives you better experience than a better GPU for less money :)

Hope that someone will join the project and support it since this is a deal breaker for most gamers.

maxroehrl commented 5 years ago

@sblantipodi @v00d00m4n I have implemented NvFCB grabbing support in my fork of this repo.

Lightpack

It needs some testing as I do not have an led strip yet. A could only test it with an RTX 2070 and it may crash if you have an Intel or AMD graphics card.

It also scales the picture by a factor of 8 like the Ddupl grabber.

Nvidias Freesync seems to work with an Asus MG279Q without stuttering while NvFBC is running with a 33ms timer.

You need to run Prismatik as an admin once because enabling NvFBC needs elevation. Only the display driver will restart and NvFCB support stays enabled without a reboot.

NvFBC only works with consumer cards if you pass a magic sequence in a setup parameter. Without this magic it only works with the pro cards. I do not know about the licensing of NvFBC. As you can see in the changed files I used two of Nvidias header files which contain the function and parameter definitions in the NvFBC64.dll.

sblantipodi commented 5 years ago

@maxroehrl Oh my god. I love this man! Congratulations man! testing it right now...

sblantipodi commented 5 years ago

@maxroehrl Ok I have tested it on my RTX2080Ti on an Acer XB271HK, Windows 10 1809. The monitor have a built in "refresh counter (not framerate counter but refresh counter)" so I can easily see with clear numbers if my GSYNC is working properly.

I generally use a framerate limiter to limit the framerate to 57FPS in order to have GSYNC always engaged even at higher framerate to reduce input lag. My monitor is 60Hz, at higher framerate GSYNC disengages and input lag is not that good as with GSYNC on.

With your patch GSYNC works flawlessly and it is a complete smooth experience. Before this patch and before windows broke things, I was able to see some "additional input lag" even with DDUPL, now it is smooth as if nothing is running under.

It works flawlessly with DX11, DX12 and Vulkan.

I tried The Witcher 3, AC Odyssey, Shadow of the tomb raider, Metro Exodus, Devil May Cry 5, youtube and a test app of my own.

A flawless experience.

Really happy with the patch, congratulations maxroehrl and really thanks for your excellent work.

@psieg this could not be possible without your excellent work too, so thank you too. the patch is a complete jewel for gamers and non gamers since it's incredibly faster than ddupl, leave it alone winapi so do you plan to merge the patch in the "release branch" anytime soon? I used a fast camera to shoot led speed transition from one color to the other (from red to blue), with ddupl there are some frames where I see the monitor green and the led blu, with NvFBC64 led transition is much better without any errors.

thank you guys. really great work!

zomfg commented 5 years ago

@maxroehrl cool 👍

It also scales the picture by a factor of 8 like the Ddupl grabber.

The 8-factor is just a sweet spot for ddupl, so if there is no negative impact to scaling lower you can try something like 20x (so grabScreen.scale = 0.05)

Also if you can avoid copying the framebuffer and use it as is (like in ddupl grabber), it'll save few % on CPU load.

maxroehrl commented 5 years ago

@zomfg You are right, it is possible to only use one buffer. I also increased the downscale to 16x.

@sblantipodi Thanks for testing! Can you also test if there is a negative impact on the visuals with 16x downscaling compared to 8x?

v00d00m4n commented 5 years ago

Maaan you did it! HUGE RESPECT!

Now a little request - can you just quickly add UI and ini element for configurable scaling factor instead of having it hardcoded?

Benik3 commented 5 years ago

So nVidia is solved, but is there any solution for AMD? (maybe using AMF?)

v00d00m4n commented 5 years ago

@maxroehrl i found an issue - for some reason it does not catch some UWP windows UI. Can you search please ivi app in WIndows Store and try with it both DD and NVFBC modes to see the difference and find out whats wrong? DD cant catch video sctream but catches UI no problem, your NVFBC implementation does none of this. while it actually should be opposite and i really expected that i could now finally watch streaming movies and tv shows with ambilight. ALso try it with netflix and few other streaming services from Windows store.

sblantipodi commented 5 years ago

@v00d00m4n can you be more specific? What UWP windows ui? I have just downloaded IVI App on the Windows Store and all the app works flawlessly, even while playing videos.

EDIT: Even netflix works flawlessly here

Arn0111 commented 5 years ago

@sblantipodi @v00d00m4n I have implemented NvFCB grabbing support in my fork of this repo.

Lightpack

It needs some testing as I do not have an led strip yet. A could only test it with an RTX 2070 and it may crash if you have an Intel or AMD graphics card.

It also scales the picture by a factor of 8 like the Ddupl grabber.

Nvidias Freesync seems to work with an Asus MG279Q without stuttering while NvFBC is running with a 33ms timer.

You need to run Prismatik as an admin once because enabling NvFBC needs elevation. Only the display driver will restart and NvFCB support stays enabled without a reboot.

NvFBC only works with consumer cards if you pass a magic sequence in a setup parameter. Without this magic it only works with the pro cards. I do not know about the licensing of NvFBC. As you can see in the changed files I used two of Nvidias header files which contain the function and parameter definitions in the NvFBC64.dll.

I will open a pull request if somebody tells me if its working with a lightstrip. Changed files Prismatik NvFBC.zip Download (8x Downscaling)

Man you are just magic. Thx for this amazing job. I was desperate to keep these lags with my gsync screen. I tested it with PUBG Shadow of the tomb raider far cry new dawn. Everything buttery smooth. When a dream becomes real.

But when you move a window slowly to the top, with black wallpaper, you see some lags with led off to white, like scales. It is worse with 16x. With DDup it is really

(Tested with 112 leds strip on my 27 screen) Sorry for my english.

sblantipodi commented 5 years ago

@zomfg You are right, it is possible to only use one buffer. I also increased the downscale to 16x.

@sblantipodi Thanks for testing! Can you also test if there is a negative impact on the visuals with 16x downscaling compared to 8x?

Here is the updated version: Prismatik NvFBC.zip Download (16x Downscaling)

you have all my respect, tell me what I need to do and I will do it. I tried it, on my RTX2080Ti I see no performance difference with naked eye, I tested with with a photo burst with this video https://www.youtube.com/watch?v=sr_vL2anfXA&list=FL9kxMLPqCEA187NGgz1fo9w&index=13&t=0s since the seconds 50 the video is quite punishing for led performance and the experience is FLAWLESS.

I even tried what @Arn0111 said but I see absolutely no difference between NvFBC and DDUPLon my 2080Ti.

Sincerely I'm not able to see differences between 8x and 16x, neither in performance or quality. I don't know if GPU/CPU performance has some impact here. Arn0111 what GPU are you using?

PS: Just to be precise if it can be useful I'm running 95 LEDS on a 27 inch display, so LEDs are quite precise. Don't know if with less LEDs some defects can be more noticeable.

Arn0111 commented 5 years ago

It’s like a led is turning on with less gradients when you move really slowly a window to any border. It’s jerky like a 30 fps compare to 60 fps, almost like WinAPI. With 16x it’s Like 10 fps. I have a MSI 1080 gaming X. With DDup it’s smooth like a real 60fps. In game or videos you can’t really see this issue.

v00d00m4n commented 5 years ago

@v00d00m4n can you be more specific? What UWP windows ui? I have just downloaded IVI App on the Windows Store and all the app works flawlessly, even while playing videos.

EDIT: Even netflix works flawlessly here

For me whenever i start IVI app ambilight just stops updating and freezing at the last frame before app fully loaded, once i close app starts updating again.

v00d00m4n commented 5 years ago

Oh wait, what windows build do you use? Maybe it somehow related to latest preview releases post 1809 build, where MS gone totally crazy and fully removed exclusive fullscreen and did some other changes to how windows are rendered.

I use version 1903 ( build 18362.1). Latest nvidia game ready driver and GTX 1080 ti with 4k display connected to HDMI and i noticed that the way how windows operates changed, it seems like, just by looking at what my tv really shows - resolution is always 4k, and rest of resolutions just got upscaled to 4k virtually and display does not switch to real display modes. I dont know if that related but this behavior in recent version of driver and windows 10 drives me nuts and very lame and it could be a reason, but i dont know for sure.

sblantipodi commented 5 years ago

It’s like a led is turning on with less gradients when you move really slowly a window to any border. It’s jerky like a 30 fps compare to 60 fps, almost like WinAPI. With 16x it’s Like 10 fps. I have a MSI 1080 gaming X. With DDup it’s smooth like a real 60fps. In game or videos you can’t really see this issue.

It seems to not happen here,.is it a.performamce related problem? What timer are you using? Using 33ms here even if 50ms could be enough.

sblantipodi commented 5 years ago

Oh wait, what windows build do you use? Maybe it somehow related to latest preview releases post 1809 build, where MS gone totally crazy and fully removed exclusive fullscreen and did some other changes to how windows are rendered.

I use version 1903 ( build 18362.1). Latest nvidia game ready driver and GTX 1080 ti with 4k display connected to HDMI and i noticed that the way how windows operates changed, it seems like, just by looking at what my tv really shows - resolution is always 4k, and rest of resolutions just got upscaled to 4k virtually and display does not switch to real display modes. I dont know if that related but this behavior in recent version of driver and windows 10 drives me nuts and very lame and it could be a reason, but i dont know for sure.

Using Windows 10 1809 and don't have this problem. It seems that windows broke things at every release. Hope they will fix in the final version.

Benik3 commented 5 years ago

Isn't it because of DRM?

zomfg commented 5 years ago

@Arn0111

It is worse with 16x. With DDup it is really (Tested with 112 leds strip on my 27 screen)

What resolution and how many LEDs on top edge for example?

Arn0111 commented 5 years ago

I am on Windows 10 1803 Latest Nvidia driver 2560 x 1440 112 Leds (36 top 36 bottom 20 sides) With or without G-Sync same issue Grab interval 17ms / 59 FPS but with 33ms same issue BaudRate 500000

With DDup transition for a led to turn on is perfectly smooth

I found another issue. If I put for exemple a YouTube video in fullscreen, ambilight freeze when controls disappears. Working with DDupl.

Here is a video of jerky issue https://share.icloud.com/photos/0UuS4PQbgYqNQIQCEYc7dfCkw#Domicile

https://share.icloud.com/photos/0HX-3Gjlr1ek3BzcLLRcrlYzw Video 1 : NvFBC Video 2 : DDup

Again, TY for this amazing work. Ambilight with gsync is pure happiness.

sblantipodi commented 5 years ago

I am on Windows 10 1803 Latest Nvidia driver 2560 x 1440 112 Leds (36 top 36 bottom 20 sides) With or without G-Sync same issue Grab interval 17ms / 59 FPS but with 33ms same issue BaudRate 500000

With DDup transition for a led to turn on is perfectly smooth

I found another issue. If I put for exemple a YouTube video in fullscreen, ambilight freeze when controls disappears. Working with DDupl.

Here is a video of jerky issue https://share.icloud.com/photos/0UuS4PQbgYqNQIQCEYc7dfCkw#Domicile

Again, TY for this amazing work. Ambilight with gsync is pure happiness.

It's strange, on my 4K monitor I'm not able to reproduce this issue. It is pretty difficult to consider this an issue even on your setup, it is a really minor annoice but I would not consider it an issue.

I can't believe to get ambilight with such a smooth experience on perfectly working GSYNC again.

Thanks again guys!

zomfg commented 5 years ago

Could someone run TimeSpy or something with ddupl and then with nvfbc (the first 8x build) to compare the impact? (for the same grab interval)

@Arn0111 does this happen with the 8x build?

sblantipodi commented 5 years ago

@zomfg I can do it this evening, I'm at work now. I will do as soon as I return home.

Arn0111 commented 5 years ago

Videos are with 8x. With 16x it’s really worse and i can notice it a little bit in game like if ambilight was in 30fps

sblantipodi commented 5 years ago

Videos are with 8x. With 16x it’s really worse and i can notice it a little bit in game like if ambilight was in 30fps

isn't 30FPS for ambilight better? it should "use less resources" and it is pretty fast for ambilight purposes. I use 30FPS on my rig. don't want to waste resources on things that doesn't give me "real" benefits.

sblantipodi commented 5 years ago

Could someone run TimeSpy or something with ddupl and then with nvfbc (the first 8x build) to compare the impact? (for the same grab interval)

@Arn0111 does this happen with the 8x build?

@zomfg done some accurate testing using 3dmark with 33ms grab interval with the 8x downscaling build. test rig: RTX2080 Ti with a 5930K@4.2GHz (haswell-e six core CPU)

I tested firestrike extreme and port royal with ddupl and nvfbc, those tests are too GPU bound and shows absolutely no performance difference between ddupl and nvfbc.

Timespy is CPU bound in some scenes and is where I can see a little bit of performance difference between ddupl and nvfbc. I repeated every test two times to reduce the margin of error.

Here the two tests with DDUPL: https://www.3dmark.com/spy/6818019 https://www.3dmark.com/spy/6817824

and there the two tests with NVFBC: https://www.3dmark.com/spy/6818084 https://www.3dmark.com/spy/6817761

as you can see NVFBC is 6%/7% faster than DDUPL on my setup.

@maxroehrl I love your NVFBC implementation and can't wait to see it released on the official version of prismatik.

This is the result without prismatik and without any screengrabber. https://www.3dmark.com/spy/6818129

as you can see NVFBC (the fastest grabbing method on my PC) sucks only 2% of performance. It is an incredible result. Congratulation guys.

v00d00m4n commented 5 years ago

Ok 16x version totally buggy! 8x original setup version works fine exept for IVI and other UWP freezing that does not happen with DD, seems like implementation NVFBC a little buggy.

So what happens in 16x and why it looks lie 30 fps - is this - actuall upper and lower lines of leds has skips and no matter the source there is broken pattern - 1-2 leds on and 2-3 leds off thich makes gradiend movement not so smooth like it has lowers fps, but fps is actually as it set, those skipped leds makes it feel laggy. Stangely vertical strips works fine, sot it only horisontally specific. 8x version has no such bug.

Fix with addition of manual input UI element for scalling is needed with default value equal to 8x version, and with fixes in scaling calculations that are screwed for horisontal leds. Grab interval manual adjustmen for NVFBC also needed, 16 ms is way better than 33. ALso implementation of NVFBC needs some other polishing.

v00d00m4n commented 5 years ago

Videos are with 8x. With 16x it’s really worse and i can notice it a little bit in game like if ambilight was in 30fps

isn't 30FPS for ambilight better? it should "use less resources" and it is pretty fast for ambilight purposes. I use 30FPS on my rig. don't want to waste resources on things that doesn't give me "real" benefits.

nope, no better at all, very annoying blinking happens. But it depends on brain and eye, some brains and eyes are kind of slower in perception so they may percieve 30 fps fine, while other brains an eyes will percieve 30 fps like some choppy slide show. And for 60 fps to work fine boud rate of arduion com port should be adjusted both in firmware and Prizmatik to at least 500 000. I struggled with lags until i tried to rise com port rate, which really allowed faster data traffic and smoothed everything out. If you boud rate was not enough for 50-60 fps it was less smoother than 30 fps so it may have given some people illusion that 30 fps is better.

maxroehrl commented 5 years ago

@v00d00m4n I added an option to change the downscaling in the experimental tab.

0 = No scaling 1 = 2x downscaling 2 = 4x downscaling 3 = 8x downscaling 4 = 16x downscaling

I also fixed NvFBC grabbing after sleeping or hibernating.

NvFBC cannot capture DXVA encrypted content like DVD or Bluray playback. I dont know about Netflix but Amazon Prime Video in the Edge browser seems to work.

@Benik3 AMD has not released the API of DEM which is the equivalent of NvFBC (Source)

Arn0111 commented 5 years ago

Wooow...just amazing..

2x or 4x downscaling seems to be perfect for me. My issue is solved.....in 1 day.

Max you are just insane.

For YouTube videos there is still the same behavior.

sblantipodi commented 5 years ago

Can you explain me what scaling does exactly? Does it scales the image before processing it?

Sincerely I see no performance difference in 2x or 8x scaling on my rig so what's the point in using 16x scaling or 8x scaling?

zomfg commented 5 years ago

It does the scaling before passing it on to Prismatik for all the color stuff, yes. It can have diminishing returns, YMMV depending on resolution, number of LEDs and capture zone sizes.

With a high number of LEDs, aggressive scaling can indeed produce artifacts because of lack of resolution per LED For ex in @Arn0111 's case : 2560 width / 36 LEDs on top edge / 16x =~ 4px wide zone, which could be too low (I don't remember the optimal size from my tests)

@maxroehrl You could use this to see what's captured

void GrabberBase::saveGrabbedScreenToBMP(const GrabbedScreen& screen, const char* suffix, const bool alt, const int frame)
{
    QImage bmp(alt ? screen.imgPaintedData : screen.imgData, screen.bytesPerRow / 4, screen.imgDataSize / screen.bytesPerRow, screen.bytesPerRow, QImage::Format_ARGB32_Premultiplied);
    bmp.save(QString("E:\\pr\\screen_%1_%2_%3.bmp").arg((long)screen.screenInfo.handle, 0, 16).arg(frame).arg(suffix));
}

Also "the power of 2" scaling is a limitation of the current scaling method of ddupl grabber. It looks like NvFBC doesn't care about that and you could use a 0.0 < x <= 1.0 value range directly for a finer tuning.

sblantipodi commented 5 years ago

It does the scaling before passing it on to Prismatik for all the color stuff, yes.

so can I say that with a 2K monitor 4x scaling is equal to 8x scaling on a 4K monitor in terms of LEDs resolution?

zomfg commented 5 years ago

For the same LED setup, yeah

sblantipodi commented 5 years ago

For the same LED setup, yeah

this explains why Arn0111 had some issues that I don't had with 8x scaling.

sblantipodi commented 5 years ago

@maxroehrl when will you open a pull request for your marvelous work?