ranahanocka / MeshCNN

Convolutional Neural Network for 3D meshes in PyTorch
MIT License
1.56k stars 314 forks source link

Training failed on multiple-GPUs #58

Open sunhuaiqiang opened 4 years ago

sunhuaiqiang commented 4 years ago

I am trying to train sgementation on shrec16 dataset on 4 1080Ti GPUs by setting --gpu_ids=0,1,2,3, but it failed by returning

/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [194,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.

The training process works successful on single GPU

ranahanocka commented 4 years ago

Hi @sunhuaiqiang ,

I have actually not tried to run this code on multiple GPUs, I will try to look into fixing it.

fishfishson commented 4 years ago

Does anyone know how to solve this problem? I also meet this problem and the error seems comes from meshconv module.

HanHan55 commented 3 years ago

Does anyone know how to solve this problem?