nagadomi / nunif

Misc; latest version of waifu2x; 2D video to stereo 3D video conversion
MIT License
1.58k stars 142 forks source link

recent update reduced speed by 10x #213

Closed longbottom-neville closed 2 months ago

longbottom-neville commented 2 months ago

So with any v2 S, i used to get 8-10 fps, however its 0.5fps after I updated it today, Is there any way I can revert the update, please?

nagadomi commented 2 months ago

I made an option whether to use CUDA streams added in the recent update. It is off by default.

cuda_stream_option

Run update.bat and try again.

nagadomi commented 2 months ago

Make sure that the problem occurs only when the Stream option is checked.

nagadomi commented 2 months ago

In my env(RTX3070ti Linux), 1080p input, Any_V2_S, Depth Batch Size=8, Worker Threads=2,

Stream OFF: 48FPS Stream ON: 61FPS

longbottom-neville commented 2 months ago

Thank you for the response Interesting, so with cuda streams on my speed is 10x slow however with it off its the same speed that I used to get before but this is with 4k videos, in 1080p however the speed remains same irrelevant to cuda streams off or on. Funny thing is my system is almost similar to yours yet very different speeds, how are you getting 40+ omg

my system - windows 11 R9 5900HX RTX 3080 32GB RAM 3TB Nvme

my speed way less than yours, I'm not sure how

So with 4k video in Any V2 S : CUDA stream off - 3.5fps CUDA stream on - 0.5fps

In 1080p videos in Any V2 S: CUDA stream off - 24 fps CUDA stram on - 24fps

longbottom-neville commented 2 months ago

Screenshot 2024-09-01 111257

nagadomi commented 2 months ago

When Stream ON, VRAM usage had increased 2-3 GB in my env. The performance problem seems to occur at higher resolutions, so it is possible that VRAM is swapping (I'm not too familiar with this, but Windows GPU driver have Virtual VRAM feature). Batch Size=4 or 2 might help.

longbottom-neville commented 2 months ago

I see, without Stream On my vram already around 7.8GB out of mine 8GB gpu, so I guess I wont get much with Stream On?

Also would you say I'd get 30-40% more frames from linux system just like yours?

Thanks again

longbottom-neville commented 2 months ago

Screenshot 2024-09-01 135447

nagadomi commented 2 months ago

VRAM usage can be reduced by decreasing Depth Batch Size. When VRAM is full, Windows GPU driver may use very slow memory transfers (it's Shared GPU Memory in that screenshot).

Also would you say I'd get 30-40% more frames from linux system just like yours?

Probably. PyTorch is slow on Windows. But I have not researched it in detail. There are many causes for it.

longbottom-neville commented 2 months ago

since the task was 30% done when you suggested to decrease the batch size, I'll wait for it to finish and then try a new task with lower batch size, thanks for your help and for your amazing work with iw3

longbottom-neville commented 2 months ago

Hey nagadomi, I've a small query, thought I'd just ask here instead of burdening you with a new thread, with certain mkv files i get this error:

"Application provided invalid, non monotonically increasing dts to muxer in stream 1:2033>=2005"

This only happens with certain files and the task fails, any idea would be greatly appreciated.

Also, can you please share ur paypal id, you've been of great help I want to get you a beer atleast for all you've done for us :)

diverswan9 commented 2 months ago

@longbottom-neville I've had that before. I think it may have do with the audio track. I'll run just the video track usually... less problems... +mux back in after. (mkvtoolnix)

kaelsonofkrypto commented 2 months ago

I made an option whether to use CUDA streams added in the recent update. It is off by default.

cuda_stream_option

Run update.bat and try again.

what is the purpose of the Streams, what does it exactly do?

nagadomi commented 2 months ago

@longbottom-neville

"Application provided invalid, non monotonically increasing dts to muxer in stream 1:2033>=2005"

stream 1 is audo track, as diverswan9 says (stream 0: video track, stream 1: audio track in iw3). Could you try specifying Start Time=00:00:01(checked) and getting the same error? (When Start Time is specified, audio track will be re-encoding)

Also, can you please share ur paypal id

This project has Patreon link. https://patreon.com/nagadomi

nagadomi commented 2 months ago

@kaelsonofkrypto When Worker Threads is non-zero, each batch is processed by multi threads, but using the same CUDA Stream (GPU processing pipeline). When Stream is checked, different CUDA Stream is used for each thread. This can actually parallelize CUDA processing pipeline. (I used this feature first time, so I'm not too sure if it is right.)

nagadomi commented 2 months ago

@andy500 from #218

so l try your new update and fix update on my 2070 rtx 20min clip with stream off 24 hours with stream on 1 hour 50min about but l use my one that l backup before update with no stream in the program and it took 60min much faster you have a bug in your update

I tried to compare with 8bef536310955800a4f475755e9cbbb3a0194378 before the change but could not reproduce it.

Here's what I suspect is the problem with the setting you were posting to https://github.com/nagadomi/nunif/discussions/212

TTA=ON

You have TTA turned on only on the 2070 side. TTA makes the processing time twice as long. Make sure you are using the same settings.

Any_V2_B Depth Batch Size=16, Worker Thread=8

2070 has 8GB VRAM. I think this setting value is too large. It may cause VRAM swapping on Windows. I think Batch Size=8, Worker Thread=2 or Batch Size=2, Worker Thread=8 is safe.

Stereo Processing Width=1280 or 1920

Stereo Processing Width is not recommended for normal video conversion. It is slower than Default and less affective.

andy500 commented 2 months ago

thank you will try the settings see how it goes when should l use Stereo Processing Width and should l use tta for better depth map quality when l uses Batch Size=8, Worker Thread=2 much faster thank you for your help nagadomi l had the batch size to high that was the problem now l know