jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
https://pyramid-flow.github.io/
MIT License
2.4k stars 233 forks source link

RuntimeError with the new flux models #158

Closed VigneshSrinivasan10 closed 2 weeks ago

VigneshSrinivasan10 commented 2 weeks ago

Congratulations and thank you on this amazing work.

Unfortunately when I try to use the flux models, I get the following error. It occurs both with t2v and i2v tasks with a different sized tensor.

  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid_dit/flux_modules/modeling_flux_block.py", line 717, in forward                                       
    return self.processor(                                                         
  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid_dit/flux_modules/modeling_flux_block.py", line 861, in __call__                                      
    hidden_states, encoder_hidden_states = self.varlen_attn(                                                                                                          
  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid_dit/flux_modules/modeling_flux_block.py", line 300, in __call__                                      
    concat_qkv_tokens[:,:,0], concat_qkv_tokens[:,:,1] = apply_rope(concat_qkv_tokens[:,:,0], concat_qkv_tokens[:,:,1], image_rotary_emb[i_p])                        
  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid_dit/flux_modules/modeling_flux_block.py", line 37, in apply_rope                                     
    xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]                                                                                        
RuntimeError: The size of tensor a (248) must match the size of tensor b (249) at non-singleton dimension 1            

Any ideas on how I can fix this would be much appreciated.

Side note: I also get this warning despite having all the models downloaded from the huggingface hub.

An error occurred while trying to fetch ./models_flux/causal_video_vae: Error no file named diffusion_pytorch_model.safetensors found in directory ./models_flux/causal_video_vae.                                                                                                                                                          
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.                                                                              

I hope these two are not related.

Thank you very much in advance.

jy0205 commented 2 weeks ago

Hi, this warning does nit affect the normal running. I guess this bug is because you use the incorrect width or height setting? Can you share your setting?

VigneshSrinivasan10 commented 2 weeks ago

Thank you for the prompt response.

I am using the default setting for the task i2v (and t2v). Trying to run it on 3 GPUs in one node. No changes other than that.

jy0205 commented 2 weeks ago

What do you set the width and height? And do you use the multi-gpu inference?

VigneshSrinivasan10 commented 2 weeks ago
width = 640
height = 384

I see that the input image is also resized with this.

image_path = 'assets/the_great_wall.jpg'
image = Image.open(image_path).convert("RGB")
image = image.resize((width, height))

And I am keeping the variant as diffusion_transformer_384p.

Yes, I use multi-gpu inference by running it this way: CUDA_VISIBLE_DEVICES=0,1,2 sh scripts/inference_multigpu.sh

VigneshSrinivasan10 commented 2 weeks ago

The issue is fixed if I set: GPUS=2

The issue can be reproduced by setting GPUS=3

Setting GPUS=4 gives another new error

  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid_dit/flux_modules/modeling_flux_block.py", line 716, in forward                                       
    return self.processor(                                                                                                                                            
  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid_dit/flux_modules/modeling_flux_block.py", line 868, in __call__                                      
    hidden_states = attn.to_out[0](hidden_states)                                                                                                                     
  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid-flow/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl      
    return self._call_impl(*args, **kwargs)
  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid-flow/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/efs/users/vsrinivasan/Projects/Pyramid-Flow/pyramid-flow/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (60x2048 and 1920x1920)

For now, the code is running for me. Its not necessary for me to use >2 GPUS (would be nice though:). Thank you very much.