Open zjq-Jackson opened 2 weeks ago
Sorry, you cannot do this with our current implementation. Even if it were possible, it would require a lot of code changes, e.g. accumulating the gradient for all sequence lengths in a video clip.
Alright, thank you very much!
"I only have one A800 80G; can I fine-tune the DiT model with it?"