microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.62k stars 2.5k forks source link

Textdiffuser #1415

Open fff518 opened 9 months ago

fff518 commented 9 months ago

noisy_residual = unet(input, t, encoder_hidden_states[:args.vis_num], masked_feature=masked_features[:16], feature_mask=feature_masks[:16], segmentation_mask=segmentation_masks[:16]).sample File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, _kwargs)_ File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward return model_forward(*args, kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/opt/conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/data/pylib/diffusers/models/unet_2d_condition.py", line 595, in forward sample = torch.cat([sample, feature_mask, masked_feature], dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 1 for tensor number 1 in the list.**

In your train.py, noisy_residual = unet(input, t, encoder_hidden_states[:args.vis_num], masked_feature=masked_features[:16], feature_mask=feature_masks[:16], segmentation_mask=segmentation_masks[:16]).sample exsits a dimensional mismatch problem ,I do not understand why this is related to args.vis_num. All my training parameters are the same as the example you gave me. I hope you can explain this to me, thank you very much!

JingyeChen commented 9 months ago

Thanks for your interest in TextDiffuser. Could you print the size of sample, feature_mask, masked_feature?