layumi / person-reid-3d

TNNLS'22 :statue_of_liberty: Parameter-Efficient Person Re-identification in the 3D Space :statue_of_liberty:
https://arxiv.org/abs/2006.04569
MIT License
265 stars 46 forks source link

RuntimeError: invalid argument 5: k not in range for dimension #3

Closed niallomahony93 closed 4 years ago

niallomahony93 commented 4 years ago

Hi,

Thank you for sharing your work.

I ran into an issue running train_M.sh on the supplied generated 3D data of the Market-1501 dataset

Number of training parameters: 2.34 M Epoch #0 Validating /ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice. out=out, kwargs) /ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/numpy/core/_methods.py:154: RuntimeWarning: invalid value encountered in true_divide ret, rcount, out=ret, casting='unsafe', subok=False) /ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice. out=out, kwargs) /ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/numpy/core/_methods.py:154: RuntimeWarning: invalid value encountered in true_divide ret, rcount, out=ret, casting='unsafe', subok=False)

0%| | 0/1617 [00:00<?, ?it/s] Traceback (most recent call last): File "train_M.py", line 298, in train(model, optimizer, scheduler, train_loader, dev, epoch) File "train_M.py", line 129, in train logits = model(xyz.detach(), rgb.detach(), istrain=True) File "/ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "/ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/ichec/work/iecom001b/person-reid-3d/model.py", line 171, in forward g = self.nng(xyz, istrain=istrain and self.graph_jitter) File "/ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/ichec/work/iecom001b/person-reid-3d/KNNGraphE.py", line 102, in forward return knn_graphE(x, self.k, istrain) File "/ichec/work/iecom001b/person-reid-3d/KNNGraphE.py", line 51, in knn_graphE k_indices = F.argtopk(dist, k, 2, descending=False) File "/ichec/home/users/niallomahony/.conda/envs/tfgpu/lib/python3.6/site-packages/dgl/backend/pytorch/tensor.py", line 132, in argtopk return th.topk(input, k, dim, largest=descending)[1] RuntimeError: invalid argument 5: k not in range for dimension at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/generic/THCTensorTopK.cu:23<

I followed all the installation steps but had to use cuda 10.0 (and cudatoolkit 10.0 and dgl-cu100 as that is what is available on the hpc.

layumi commented 4 years ago

Hi @niallomahony93 The error appears in the KNNGraph Building. There are three things that needs to be checked:

  1. Use one small k value. For example, k=9. add --k 9
  2. Print the input shape. Is the input right? You may add some print function in KNNGraphE.py to check it.
  3. Check the pytorch version and some libraries. When you install them, is there any warning or error?
niallomahony93 commented 4 years ago

Thanks @layumi The issue was due to an old version of open3d (0.6) which was not loading any of the points from the .obj files. I upgraded to open3d 0.9 and training started successfully.