kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 107 forks source link

2x speedup #21

Closed sALTaccount closed 1 year ago

sALTaccount commented 1 year ago

torch_gc() is very slow and is being called way to many times.

Here is a breakdown of the time taken inside of the ddim sample method image After removing the calls to torch_gc image We now spend most 99.8% of our time actually sampling instead of the previous 73%

I also remove a GC call inside the ddim_sample_loop method. With all of this combined, I go from ~1.3 it/s to ~2.6 it/s. I was not able to find any extra VRAM usage by removing these GC calls

kabachuha commented 1 year ago

It may influence 'RAM' usage, though. Maybe, add it as a GUI checkbox?

sALTaccount commented 1 year ago

image image

There is 0 difference in RAM or VRAM after the GC call for both that I removed

Testing code was as follows:

        process = psutil.Process(os.getpid())
        print('CPU', process.memory_info().rss)
        print('GPU:', torch.cuda.memory_allocated())
        print('CALL GC')
        torch_gc()
        print('CPU', process.memory_info().rss)
        print('GPU:', torch.cuda.memory_allocated())
kabachuha commented 1 year ago

Sounds great, I'll test it now

kabachuha commented 1 year ago

Yeah, it is faster! I'll add it then, thanks for your contribution! ❤️‍🔥