Closed longbottom-neville closed 2 months ago
I made an option whether to use CUDA streams added in the recent update. It is off by default.
Run update.bat
and try again.
Make sure that the problem occurs only when the Stream
option is checked.
In my env(RTX3070ti Linux), 1080p input, Any_V2_S, Depth Batch Size=8, Worker Threads=2,
Stream OFF: 48FPS Stream ON: 61FPS
Thank you for the response Interesting, so with cuda streams on my speed is 10x slow however with it off its the same speed that I used to get before but this is with 4k videos, in 1080p however the speed remains same irrelevant to cuda streams off or on. Funny thing is my system is almost similar to yours yet very different speeds, how are you getting 40+ omg
my system - windows 11 R9 5900HX RTX 3080 32GB RAM 3TB Nvme
my speed way less than yours, I'm not sure how
So with 4k video in Any V2 S : CUDA stream off - 3.5fps CUDA stream on - 0.5fps
In 1080p videos in Any V2 S: CUDA stream off - 24 fps CUDA stram on - 24fps
When Stream ON, VRAM usage had increased 2-3 GB in my env. The performance problem seems to occur at higher resolutions, so it is possible that VRAM is swapping (I'm not too familiar with this, but Windows GPU driver have Virtual VRAM feature). Batch Size=4 or 2 might help.
I see, without Stream On my vram already around 7.8GB out of mine 8GB gpu, so I guess I wont get much with Stream On?
Also would you say I'd get 30-40% more frames from linux system just like yours?
Thanks again
VRAM usage can be reduced by decreasing Depth Batch Size.
When VRAM is full, Windows GPU driver may use very slow memory transfers (it's Shared GPU Memory
in that screenshot).
Also would you say I'd get 30-40% more frames from linux system just like yours?
Probably. PyTorch is slow on Windows. But I have not researched it in detail. There are many causes for it.
since the task was 30% done when you suggested to decrease the batch size, I'll wait for it to finish and then try a new task with lower batch size, thanks for your help and for your amazing work with iw3
Hey nagadomi, I've a small query, thought I'd just ask here instead of burdening you with a new thread, with certain mkv files i get this error:
"Application provided invalid, non monotonically increasing dts to muxer in stream 1:2033>=2005"
This only happens with certain files and the task fails, any idea would be greatly appreciated.
Also, can you please share ur paypal id, you've been of great help I want to get you a beer atleast for all you've done for us :)
@longbottom-neville I've had that before. I think it may have do with the audio track. I'll run just the video track usually... less problems... +mux back in after. (mkvtoolnix)
I made an option whether to use CUDA streams added in the recent update. It is off by default.
Run
update.bat
and try again.
what is the purpose of the Streams, what does it exactly do?
@longbottom-neville
"Application provided invalid, non monotonically increasing dts to muxer in stream 1:2033>=2005"
stream 1
is audo track, as diverswan9 says (stream 0: video track, stream 1: audio track in iw3).
Could you try specifying Start Time=00:00:01
(checked) and getting the same error?
(When Start Time
is specified, audio track will be re-encoding)
Also, can you please share ur paypal id
This project has Patreon link. https://patreon.com/nagadomi
@kaelsonofkrypto
When Worker Threads
is non-zero, each batch is processed by multi threads, but using the same CUDA Stream (GPU processing pipeline).
When Stream
is checked, different CUDA Stream is used for each thread. This can actually parallelize CUDA processing pipeline.
(I used this feature first time, so I'm not too sure if it is right.)
@andy500 from #218
so l try your new update and fix update on my 2070 rtx 20min clip with stream off 24 hours with stream on 1 hour 50min about but l use my one that l backup before update with no stream in the program and it took 60min much faster you have a bug in your update
I tried to compare with 8bef536310955800a4f475755e9cbbb3a0194378 before the change but could not reproduce it.
Here's what I suspect is the problem with the setting you were posting to https://github.com/nagadomi/nunif/discussions/212
TTA=ON
You have TTA turned on only on the 2070 side. TTA makes the processing time twice as long. Make sure you are using the same settings.
Any_V2_B Depth Batch Size=16, Worker Thread=8
2070 has 8GB VRAM. I think this setting value is too large. It may cause VRAM swapping on Windows.
I think Batch Size=8, Worker Thread=2
or Batch Size=2, Worker Thread=8
is safe.
Stereo Processing Width=1280 or 1920
Stereo Processing Width
is not recommended for normal video conversion.
It is slower than Default
and less affective.
thank you will try the settings see how it goes when should l use Stereo Processing Width and should l use tta for better depth map quality when l uses Batch Size=8, Worker Thread=2 much faster thank you for your help nagadomi l had the batch size to high that was the problem now l know
So with any v2 S, i used to get 8-10 fps, however its 0.5fps after I updated it today, Is there any way I can revert the update, please?