Open caojiachen1 opened 1 week ago
I actually tested it today, the bigger issue is that torchao latest versions has to be compiled to install in Windows, which isn't all that simple. Similarly torch.compile requires Triton (Linux only), and on Linux I find onediff to be much faster anyway.
I had int4 torchao running on windows after some trouble, it used around 12GB VRAM and speed was very slow 9s/it.
The cogvideox model uses torchao as its official quantization method which can achieve a great balance between inference speed and output video quality. The current fp8 quantization seems to incur significant quality loss especially when contrasted with torchao method. I tried to modify the custom node to support torchao but somehow the program always run into an exception which really makes me frustrated.