Closed henbucuoshanghai closed 1 month ago
T, H, W = self.get_dynamic_size(x) even in one batch, the input video embedding of T H W is different with each other? every video has its own T W H? so the input dimensions of STdit3.py is changee depend on every video?
视频vae压缩后,维度一致?
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
https://github.com/hpcaitech/Open-Sora/blob/476b6dc79720e5d9ddfb3cd589680b2308871926/opensora/models/stdit/stdit3.py#L364C8-L364C43 T, H, W = self.get_dynamic_size(x)