Open philbanjo55 opened 1 month ago
Unfortunately, we haven't tested the minimum GPU requirements for fp32. Indeed, the multi-GPU version is not implemented to be compatible with sequential CPU offloading. And since the model is trained on bf16, it does not work on fp16.
I have tried both the single gpu and multi gpu versions of the code. The single gpu, float32 run works only on the smaller resolution; when I run it on the 768 I get a Out of memory error on anything more than 2 frames (unless I run it with sequential cpu offloading). I have the save memory on, and even reduced the min packet size to 64, still getting memory error as it's loading more than 24gig onto the GPU.
The same happens on the multiGPU version as well, AND, the multi gpu version doesnt work with sequential cpu offloading (or isnt implemented yet).
Also, I am able to run the mutliGPU full resolution on fp16 but it isn't working ?
Thanks for the help!