Closed EmilCreatePro closed 4 years ago
I manage to fix this by going into nnet/py_factory.py and change the imports:
HOW IT WAS : _from models.py_utils.data_parallel import DataParallel
_
CHANGED TO: from torch.nn import DataParallel
And also had to remove the chunk_size parameter in the DataParallel function call :)
Don't know how much it would change the performance tho.. other solution I didn't found
Hello @xingyizhou, I am trying to train this network with my own dataset and I keep getting this Device index must be -1 or non-negative error (see below):
_Traceback (most recent call last): File "train.py", line 225, in
train(training_dbs, None, args.start_iter, args.debug)
File "train.py", line 159, in train
training_loss = nnet.train(*training)
File "/content/ExtremeNet/nnet/py_factory.py", line 83, in train
loss = self.network(xs, ys)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/content/ExtremeNet/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/content/ExtremeNet/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/content/ExtremeNet/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/content/ExtremeNet/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/content/ExtremeNet/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/content/ExtremeNet/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/content/ExtremeNet/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/usr/local/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/usr/local/lib/python3.7/site-packages/torch/cuda/comm.py", line 148, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: Device index must be -1 or non-negative, got -14913 (Device at /pytorch/c10/Device.h:40)**
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f4cef83c021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f4cef83b8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: + 0x10ceca (0x7f4d29b74eca in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optional, std::allocator<c10::optional > > > const&) + 0x2dc (0x7f4d29f4faac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x4ed28f (0x7f4d29f5528f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x11663e (0x7f4d29b7e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)