Error of Dimension for Matrix Multiplication while Text Conditioning

Hello authors, thanks a lot for this excellent work!

I got a problem while executing code for Text-to-LiDAR task by the command you provided. CUDA_VISIBLE_DEVICES=0 python scripts/text2lidar.py -r models/lidm/kitti/cam2lidar/model.ckpt -d kitti -p "an empty road with no object"

also I edited checkpoint path to change the configuration file from cam2lidar to text2lidar and tried again. CUDA_VISIBLE_DEVICES=2 python scripts/text2lidar.py -r models/lidm/kitti/text2lidar/model.ckpt -d [12/12]p "an empty road with no object"

but I met same errors from two of them with matrix's dimension as below. I tried to find solution to fix this error all day, but I couldn't due to my short experience on this code. Can you check the code and find out the reason?

Thank you.

/home/pjs/.conda/envs/lidar_diffusion/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.                                                                                                                                                                            _torch_pytree._register_pytree_node(                                                                                                                                                                   Logdir is models/lidm/kitti/text2lidar                                                                                                                                                                   {'model': {'base_learning_rate': 2e-06, 'target': 'lidm.models.diffusion.ddpm.LatentDiffusion', 'params': {'linear_start': 0.0015, 'linear_end': 0.0195, 'num_timesteps_cond': 1, 'log_every_t': 100, 'ti
mesteps': 1000, 'image_size': [16, 128], 'channels': 8, 'monitor': 'val/loss_simple_ema', 'first_stage_key': 'image', 'cond_stage_key': 'camera', 'conditioning_key': 'crossattn', 'cond_stage_trainable'
: True, 'verbose': False, 'unet_config': {'target': 'lidm.modules.diffusion.openaimodel.UNetModel', 'params': {'image_size': [16, 128], 'in_channels': 8, 'out_channels': 8, 'model_channels': 256, 'atte
ntion_resolutions': [4, 2, 1], 'num_res_blocks': 2, 'channel_mult': [1, 2, 4], 'num_head_channels': 32, 'use_spatial_transformer': True, 'context_dim': 512, 'lib_name': 'lidm'}}, 'first_stage_config': 
{'target': 'lidm.models.autoencoder.VQModelInterface', 'params': {'embed_dim': 8, 'n_embed': 16384, 'lib_name': 'lidm', 'use_mask': False, 'ckpt_path': 'models/first_stage_models/kitti/f_c2_p4_wo_ls/mo
del.ckpt', 'ddconfig': {'double_z': False, 'z_channels': 8, 'in_channels': 1, 'out_ch': 1, 'ch': 64, 'ch_mult': [1, 2, 2, 4], 'strides': [[1, 2], [2, 2], [2, 2]], 'num_res_blocks': 2, 'attn_levels': []
, 'dropout': 0.0}, 'lossconfig': {'target': 'torch.nn.Identity'}}}, 'cond_stage_config': {'target': 'lidm.modules.encoders.modules.FrozenClipMultiImageEmbedder', 'params': {'model': 'ViT-L/14', 'split_
per_view': 4, 'key': 'camera', 'out_dim': 512}}}}, 'data': {'target': 'main.DataModuleFromConfig', 'params': {'batch_size': 8, 'num_workers': 8, 'wrap': True, 'dataset': {'size': [64, 1024], 'fov': [3,
 -25], 'depth_range': [1.0, 56.0], 'depth_scale': 56, 'log_scale': False, 'x_range': [-50.0, 50.0], 'y_range': [-50.0, 50.0], 'z_range': [-3.0, 1.0], 'resolution': 1, 'num_channels': 1, 'num_cats': 10,
 'num_views': 1, 'num_sem_cats': 19, 'filtered_map_cats': []}, 'aug': {'flip': False, 'rotate': False, 'keypoint_drop': False, 'keypoint_drop_range': None, 'randaug': False, 'camera_drop': 0.5}, 'train': {'target': 'lidm.data.kitti.KITTI360Train', 'params': {'condition_key': 'camera', 'split_per_view': 4}}, 'validation': {'target': 'lidm.data.kitti.KITTI360Validation', 'params': {'condition_key': 'c
amera', 'split_per_view': 4}}}}, 'lightning': {'callbacks': {'image_logger': {'target': 'main.ImageLogger', 'params': {'batch_frequency': 5000, 'max_images': 8, 'increase_log_steps': False}}}, 'trainer
': {'benchmark': True}}}                                                                                                                                                                                 
Loading model from models/lidm/kitti/text2lidar/model.ckpt                                          
DiffusionWrapper has 395.00 M params.                                                                                                                                                                    
Restored from models/first_stage_models/kitti/f_c2_p4_wo_ls/model.ckpt with 0 missing and 26 unexpected keys                     
global step: 22770                                                                                                                                                                                       
===========================================================================
logging to:                                                                                                                                                                                              
models/lidm/kitti/text2lidar/samples/00022770/an_empty_road_with_no_object
===========================================================================                                                                                                                              
{'resume': 'models/lidm/kitti/text2lidar/model.ckpt', 'prompt': 'an empty road with no object', 'n_samples': 50, 'eta': 1.0, 'vanilla': False, 'logdir': 'none', 'custom_steps': 50, 'batch_size': 10, 'n
um_views': 4, 'apply_all': False, 'seed': 1000, 'dataset': 'kitti', 'verbose': False, 'base': ['models/lidm/kitti/text2lidar/config.yaml']}
Running conditional sampling
Sampling Batches (unconditional):   0%|                                                                                                                                            | 0/5 [00:00<?, ?it/s]

Traceback (most recent call last):
  File "/raid/workspace/cvml_user/pjs/gen2024_final/LiDAR-Diffusion/scripts/text2lidar.py", line 385, in <module>
    run(model, text_encoder, opt.prompt, imglogdir, pcdlogdir, custom_steps=opt.custom_steps, config=config, verbose=opt.verbose)
  File "/raid/workspace/cvml_user/pjs/gen2024_final/LiDAR-Diffusion/scripts/text2lidar.py", line 130, in run
    cond = model.cond_stage_model(cond)
  File "/home/pjs/.conda/envs/lidar_diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/pjs/.conda/envs/lidar_diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/raid/workspace/cvml_user/pjs/gen2024_final/LiDAR-Diffusion/./lidm/modules/encoders/modules.py", line 250, in forward
    x = self.linear(x)
  File "/home/pjs/.conda/envs/lidar_diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/pjs/.conda/envs/lidar_diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/pjs/.conda/envs/lidar_diffusion/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x3072 and 768x512)

hancyran / LiDAR-Diffusion

Error of Dimension for Matrix Multiplication while Text Conditioning #12