Issue while training GAN

RamyaRaghuraman commented 4 years ago

HI everyone,

I face the following issue when running this command for the tiny imagenet dataset:

python src/train_gan.py --epochs 10

Error: Traceback (most recent call last): File "C:/Users/RAR7ABT/pj-val-ml/pjval_ml/OSR/counterfactual/src/train_gan.py", line 32, in train_gan(networks, optimizers, dataloader, epoch=epoch, options) File "C:\Users\RAR7ABT\pj-val-ml\pjval_ml\OSR\counterfactual\src\training.py", line 67, in train_gan logits = netD(images)[:,0] File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(*input, *kwargs) File "C:\Users\RAR7ABT\pj-val-ml\pjval_ml\OSR\counterfactual\src\network_definitions.py", line 275, in forward x = self.fc1(x) File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(input, kwargs) File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\modules\linear.py", line 92, in forward return F.linear(input, self.weight, self.bias) File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\functional.py", line 1406, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: size mismatch, m1: [64 x 16384], m2: [4096 x 20] at C:/w/1/s/tmp_conda_3.6_041836/conda/conda-bld/pytorch_1556684464974/work/aten/src\THC/generic/THCTensorMathBlas.cu:268

Process finished with exit code 1

Any help would be really appreciated. Thanks in advance!

RamyaRaghuraman commented 4 years ago

@lwneal the error seems to come from x = self.fc1(x) from class multiclassDiscriminator32(nn.Module). The size of m1 must be m1: [64 x 4096 ] but I somehow end up with m1: [64 x 16384]

Please do take a look at the discriminator updates in training.py @lwneal @mattolson93

KevLuo commented 3 years ago

@RamyaRaghuraman, I ran into a similar issue when training only the baseline classifier; for me, the error was because I had not resized the input images from 64 x 64 to 32x 32. If this resizing doesn't happen, the size of the network output ends up 4x bigger...which would explain why your output size is 16384 instead of the desired 4096 since 16384 = 4 * 4096.

77flyy commented 8 months ago

@KevLuo Hello sir, I encountered the same problem. I see the ImageConverter could resize img to 32x32,

# Crops, resizes, normalizes, performs any desired augmentations
# Outputs images as eg. 32x32x3 np.array or eg. 3x32x32 torch.FloatTensor

but it looks like it didn't. So, do we need to re-write the converter to make a resize transform?

lwneal / counterfactual-open-set

Issue while training GAN #2