Open SoftologyPro opened 1 month ago
I successfully installed the correct versions of torch with CUDA 12.4 enabled through torch and xformers 0.0.28.post1 and still get this error.
Check issue https://github.com/rhymes-ai/Allegro/issues/17
Changing line 824 in Allegro/allegro/models/transformers/block.py
from
with sdpa_kernel(SDPBackend.FLASH_ATTENTION):
to
with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=True):
gets past the no available kernel error.
But, has an estimated 2 hours 40 minutes to finish on a 4090. In the end it took over 3 hours to finish the 5 second default settings.
Changing line 13 in single_inference.py
from
dtype=torch.bfloat16
to
dtype=torch.float16
(as also shown in https://github.com/rhymes-ai/Allegro/issues/17)
will take an estimated 19 hours(!!) so do not try that change.
Are there any other possible ways we can get this down to a reasonable time on a 24GB consumer GPU?
with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=True):
this helped, but about 4 hours to an end on 3090 :) with --enable_cpu_offload
Adding the --enable_cpu_offload
argument to single_inference.py gets the estimated time down to 1 hour 40 minutes on a 24GB 4090.
@SoftologyPro Seems make sense. I tested on H100 enable-cpu-offload a single 100 steps video takes 1h10min. That's why I wrote the inference time will increase significantly Btw, do you have more than one 4090? I'm going to release the multi-card inference code. Context-parallel seems helps a lot with 4090.
No, I only have a single 4090. This interest came from a request for me to support Allegro in Visions of Chaos. But if it takes 2 hours on the best consumer GPU it is too slow for local Windows. If some speed breakthrough is made I will be happy to include it.
@SoftologyPro Currently I have no idea. I suggest the method of distillation to reduce the inference steps like reduce from 100 steps to 4 steps, but it harms the quality severely.
Trying to get this working under Windows.
I clone the repository, create a new venv and try and install requirements.txt. xformers fails with
If I try and install torch first before requirements it still fails. So, I remove xformers and let the rest of the requirements finish. Once they are done I install xformers and torch using...
Then when I run single_inference I get
What version of xformers and torch do I need to get this to work under Windows?