Tensor size mismatch in CogVideoX transformer forward pass

Hi, thanks for your great work! I encountered an error when trying to run the CogVideoX model on a single A800. The error occurs in the forward pass of the transformer, specifically when adding positional embeddings to the hidden states. The tensor sizes do not match, which suggests a potential issue with the model's implementation or configuration.

Here is the output log:

srun --account=test --exclusive=user -p A800 -N 1 --time 20:00 --job-name=xdit --ntasks-per-node=1 --gres=gpu:1 --export=ALL bash ./examples/infer.sh srun: job 3975 queued and waiting for resources srun: job 3975 has been allocated resources ++ awk '{print $2}' ++ grep BatchHost ++ tr = ' ' ++ scontrol show jobid=3975
export MASTER_ADDR=g41
MASTER_ADDR=g41 ++ expr 0 / 1
export NODE_RANK=0
NODE_RANK=0
export CUDA_DEVICE_MAX_CONNECTIONS=1
CUDA_DEVICE_MAX_CONNECTIONS=1
export CUDA_LAUNCH_BLOCKING=1
CUDA_LAUNCH_BLOCKING=1
export RANK=0
RANK=0
export LOCAL_RANK=0
LOCAL_RANK=0
exec python -W ignore ./examples/cogvideox_example.py --model /home/test/test01/cyy/Data/models--THUDM--CogVideoX-2b/snapshots/ad5ce8664edfdc95cdb9773dd4f80048b25f69ac --ulysses_degree 1 --num_inference_steps 1 --warmup_steps 0 --prompt 'sunset over the sea.' WARNING 09-10 12:13:01 [args.py:250] Distributed environment is not initialized. Initializing... DEBUG 09-10 12:13:01 [parallel_state.py:180] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl ===========0 - Parallel Group Initalized!=========== INFO 09-10 12:13:01 [config.py:93] Ring degree not set, using default value 1 INFO 09-10 12:13:01 [config.py:137] Pipeline patch number not set, using default value 1 Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00, 4.15s/it]it/s] Loading pipeline components...: 100%|██████████| 5/5 [00:12<00:00, 2.45s/it] WARNING 09-10 12:13:13 [runtime_state.py:63] Model parallel is not initialized, initializing... INFO 09-10 12:13:13 [base_pipeline.py:236] Transformer backbone found, but model parallelism is not enabled, use naive model INFO 09-10 12:13:13 [base_pipeline.py:286] Scheduler found, paralleling scheduler... 0%| | 0/1 00:00<?, ?it/s: Traceback (most recent call last): rank0: File "/home/test/test01/cyy/xDiT/./examples/cogvideox_example.py", line 63, in rank0: File "/home/test/test01/cyy/xDiT/./examples/cogvideox_example.py", line 38, in main rank0: output = pipe( rank0: File "/home/test/test01/anaconda3/envs/xdit/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context rank0: return func(*args, kwargs) rank0: File "/home/test/test01/cyy/xDiT/xfuser/model_executor/pipelines/base_pipeline.py", line 131, in data_parallel_fn rank0: return func(self, *args, *kwargs) rank0: File "/home/test/test01/cyy/xDiT/xfuser/model_executor/pipelines/base_pipeline.py", line 145, in check_naive_forward_fn rank0: return self.module(args, kwargs) rank0: File "/home/test/test01/anaconda3/envs/xdit/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context rank0: return func(*args, kwargs) rank0: File "/home/test/test01/anaconda3/envs/xdit/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 687, in call rank0: noise_pred = self.transformer( rank0: File "/home/test/test01/anaconda3/envs/xdit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/home/test/test01/anaconda3/envs/xdit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/home/test/test01/anaconda3/envs/xdit/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 432, in forward rank0: hidden_states = hidden_states + pos_embeds rank0: RuntimeError: The size of tensor a (53474) must match the size of tensor b (17776) at non-singleton dimension 1 srun: error: g41: task 0: Exited with exit code 1

Additional notes:

I did not modify any settings or configurations from the default.
This error occurs consistently when trying to run the model.
Version: Python: 3.10.14 torch: 2.4.0+cu121 diffusers: 0.30.2 xfuser: 0.3.1

Thanks for your help!

xdit-project / xDiT

Tensor size mismatch in CogVideoX transformer forward pass #260