(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Other
524
stars
28
forks
source link
When I was training with cls_on_clean_image, I got an error: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead #38
When I was training cls_on_clean_image, I encountered the following error: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead.
I encountered an error in the second stage of training:
Traceback (most recent call last):
File "/root/workspace/baidu/personal-code/DMD2/main/train_sd.py", line 753, in <module>
trainer.train()
File "/root/workspace/baidu/personal-code/DMD2/main/train_sd.py", line 645, in train
self.train_one_step()
File "/root/workspace/baidu/personal-code/DMD2/main/train_sd.py", line 425, in train_one_step
self.accelerator.backward(guidance_loss)
File "/usr/local/python3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1987, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/usr/local/python3.9/lib/python3.9/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/usr/local/python3.9/lib/python3.9/site-packages/torch/autograd/__init__.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
In the following [1, 3, 512, 512], 1 refers to batch size, 3 refers to the number of image channels, and 512 refers to the image resolution.
So this error means that a weight of [320, 4, 3, 3] expects the number of input channels to be 4, but the actual number of input channels is 3.
Printing the dimensions before and after GAN classification, two output sizes appear:
I replaced our own image Dataset and did not use LMDB. Is the real_image shape loaded by retrieve_row_from_lmdb of SDImageDatasetLMDB the same as the fake_image shape?
If it is not the problem described in 1, what might be the cause of this problem?
When I was training cls_on_clean_image, I encountered the following error: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead.
The first stage of training script:
Training script for the second stage:
I encountered an error in the second stage of training:
In the following [1, 3, 512, 512], 1 refers to batch size, 3 refers to the number of image channels, and 512 refers to the image resolution.
So this error means that a weight of [320, 4, 3, 3] expects the number of input channels to be 4, but the actual number of input channels is 3.
Printing the dimensions before and after GAN classification, two output sizes appear:
I think there is a mismatch here.
My question is: