Closed PiaCuk closed 2 years ago
Hi @PiaCuk ! Can you post the versions of Python, PyTorch, and NumPy you are using?
Hey, thanks for the quick reply! I've been meaning to post an update just now. I'm using Python 3.9.9, Numpy 1.21.5, and I just updated PyTorch from 1.7.1 to 1.10.1. My models are ResNets from torchvision 0.11.2. Here's the new error message:
Traceback (most recent call last):
File "main.py", line 49, in <module>
ImageNet_experiment(**params)
File "imagenet.py", line 95, in ImageNet_experiment
acc = distiller.train_student(**params, smooth_teacher=False)
File "Tf_KD/virtual_teacher.py", line 123, in train_student
student_out = self.student_model(data)
File "miniconda/envs/ffcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "models/resnet.py", line 35, in forward
return self.resnet_model(X)
File "miniconda/envs/ffcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "miniconda/envs/ffcv/lib/python3.9/site-packages/torchvision/models/resnet.py", line 249, in forward
return self._forward_impl(x)
File "miniconda/envs/ffcv/lib/python3.9/site-packages/torchvision/models/resnet.py", line 232, in _forward_impl
x = self.conv1(x)
File "miniconda/envs/ffcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "miniconda/envs/ffcv/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "miniconda/envs/ffcv/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same
Oh ok! I think this error now has nothing to do with NormalizeImage, and instead the problem is just that your model is in full-precision mode while FFCV is loading the data in half-precision format. You can either convert the training code to work with half-precision (which I would recommend as you will see significant speedups even outside of data loading), or you can load the data in full-precision mode, by replacing np.float16
with np.float32
in the pipeline.
For information about using half-precision training, see e.g., https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html
Thank you, this makes a lot of sense now! I will look into it.
I'm trying to train a model on ImageNet with FFCV. I created a conda environment as written in install.sh and wrote ImageNet to a .ffcv with
./write_imagenet.sh 500 0.50 90
from ffcv-imagenet. This is the error that I get:I replaced the DataLoader of a working PyTorch training pipeline with this:
Any ideas on what is causing the problem?