kuangliu / pytorch-cifar

95.47% on CIFAR10 with PyTorch
MIT License
5.94k stars 2.14k forks source link

Errors when testing on CPU #154

Open bryanbocao opened 2 years ago

bryanbocao commented 2 years ago
layer_name: <class 'torch.nn.modules.conv.Conv2d'>, total_params: 15121584, total_traina_params: 15121584, n_layers: 39
device:  cpu
Traceback (most recent call last):
  File "main.py", line 208, in <module>
    test(epoch)
  File "main.py", line 189, in test
    outputs = net(inputs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Repos/pytorch-cifar/models/dla_simple.py", line 106, in forward
    out = self.base(x)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_conv2d_forward
logan-mo commented 2 years ago

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

bryanbocao commented 2 years ago

@Phillibob55 That's a different problem. The current issue is not about CUDA driver installation/configurations. I can run it on GPU but I intentionally wanted to test the CPU runtime, which requires both the models and data to be in CPU memory instead of GPU.

The model is trained and saved on GPU memory, need to add map_location=device argument when loading the model where device='cpu' in order to run the model on CPU. I've solved this issue by

parser.add_argument('--select_device', type=str, default='gpu', help='gpu | cpu')
...
device = 'cuda' if torch.cuda.is_available() and args.select_device == 'gpu' else 'cpu'
...
checkpoint = torch.load('./checkpoint/{}_ckpt.pth'.format(args.net), map_location=device)

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

bryanbocao commented 2 years ago

@Phillibob55 Check this out https://github.com/kuangliu/pytorch-cifar/pull/152 https://github.com/kuangliu/pytorch-cifar/pull/152/commits/b782bba05821e2a31871b46adf9d33da3b00e036

logan-mo commented 2 years ago

@Phillibob55 That's a different problem. The current issue is not about CUDA driver installation/configurations. I can run it on GPU but I intentionally wanted to test the CPU runtime, which requires both the models and data to be in CPU memory instead of GPU.

The model is trained and saved on GPU memory, need to add map_location=device argument when loading the model where device='cpu' in order to run the model on CPU. I've solved this issue by

parser.add_argument('--select_device', type=str, default='gpu', help='gpu | cpu')
...
device = 'cuda' if torch.cuda.is_available() and args.select_device == 'gpu' else 'cpu'
...
checkpoint = torch.load('./checkpoint/{}_ckpt.pth'.format(args.net), map_location=device)

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

ooh, yeah. Makes sense that way. OC seems to be inactive. So I'm working on my own version of this, which runs on any image dataset and doesn't have these problems, etc.

logan-mo commented 2 years ago

@bryanbocao Can you kinda guide to make these models work with image sizes other than 32x32?

bryanbocao commented 2 years ago

@Phillibob55 I am happy to work on that. Do you mean (1) simply resize any arbitrary images into 32x32 resolution and feed them into these models? It can simply be done by adding one more argument in the command line and resize method in the code. or (2) prepare a set of models whose direct input shape is different from 32x32.

logan-mo commented 2 years ago

@bryanbocao I started off with the first approach and just added a resize transform, but that loses a lot of information. For datasets like ImageNet, this doesn't give accuracy above 50%. So I was thinking maybe if I try to make the models accept images of any size, it might give me better results, taking more time training of course.

The codebase on the repo has a lot of hardcoded elements. I combated the 10 output classes by adding an argument for number of classes in every model class. But I don't have the knowledge to know what's going on in the complex models to modify them to be able to accept images of any size.

P.S, I'm using a Jupyter notebook instead of my main.py

logan-mo commented 2 years ago

I've created a repo for it here

bryanbocao commented 2 years ago

@bryanbocao I started off with the first approach and just added a resize transform, but that loses a lot of information. For datasets like ImageNet, this doesn't give accuracy above 50%. So I was thinking maybe if I try to make the models accept images of any size, it might give me better results, taking more time training of course.

The codebase on the repo has a lot of hardcoded elements. I combated the 10 output classes by adding an argument for number of classes in every model class. But I don't have the knowledge to know what's going on in the complex models to modify them to be able to accept images of any size.

P.S, I'm using a Jupyter notebook instead of my main.py

@Phillibob55 Sounds good. If you would like to create an easy-to-use repo that we can just change some arguments to train and test many different models, I am happy to contribute in my spare time. I have forked you repo to https://github.com/bryanbocao/image-classification

CopyABCs commented 1 year ago

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progressbar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in , term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

Selfpline6 commented 1 year ago

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progressbar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in , term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progressbar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in , term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

I have the same problem as you.