RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

INGnowait commented 3 years ago

when I try to run the codes，there is the wrong: "RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'". How can I solve it?

Thank you.

sunsmarterjie commented 3 years ago

It looks like you didn't put the model or tensor on the GPU device. Can you provide more details？

INGnowait commented 3 years ago

when I runned the codes "train.py" and ''train_AdaLSN.py', there will be errors ： ' due to scipy/scipy#11299), please set init_weights=True.', FutureWarning) Loading pretrained weights from ./../dataset/models/inception_v3_google-1a9a5a14.pth 08/07 01:12:16 PM params:28.575M Traceback (most recent call last): File "train_AdaLSN.py", line 65, in train_model(geno, dataloader, args) File "train_AdaLSN.py", line 56, in train_model loss = trainer.train() File "/media/j/data/SDL-Skeleton-main/engines/trainer_AdaLSN.py", line 45, in train data, target = next(dataiter) File "/home/j/anaconda3/envs/skl/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/home/j/anaconda3/envs/skl/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 560, in _next_data index = self._next_index() # may raise StopIteration File "/home/j/anaconda3/envs/skl/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/home/j/anaconda3/envs/skl/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 226, in iter for idx in self.sampler: File "/home/j/anaconda3/envs/skl/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 124, in iter yield from torch.randperm(n, generator=generator).tolist() RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

I tried to find some solutions, and part of the result pointed out that the problem was in this code "torch.set_default_tensor_type('torch.cuda.FloatTensor')"

In addtion, when I tried the code "python train.py --network deep_flux",there would be the error: "TypeError: init() takes 2 positional arguments but 3 were given". I changed the code “ dataset = TrainDataset(args.files, args.root)” in the “train.py” into " dataset = TrainDataset((args.files, args.root))". The error disappered. However, this code in the "train_AdaLSN.py" donnot need to be modified.

Except for the necessary paths and the above modifications, I did not make adjustments to other parts of the code. My environment is: ubuntu18.04, anaconda3, Cuda11.1(RTX3090), pytroch1.9.0

sunsmarterjie commented 3 years ago

Based on this information above I still can’t determine where the problem is. And I did not encounter this error when I tried the code... Maybe it's a problem with cuda and pytorch version. I adopt cuda10.2 and pytorch1.1. If you want to discuss more conveniently, please add me on WeChat: sunsmarterjie.

Deepflux needs another DataLoader (at the top of train.py).

charlesmarseille commented 2 years ago

you probably installed the cpu based torch version in your environment. Go through the steps to install pytorch again and make sure the source uses the CUDA version and not the CPU version.

ludysama commented 2 years ago

I faced with the same bug when I use Ubuntu20.04 as developing platform. The solution is type a '#' before 40 line of train.py (torch.set_default_tensor_type('torch.cuda.FloatTensor))

There may remain some bugs in pytorch's early code.

sunsmarterjie / SDL-Skeleton

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu' #9