noahcao / Pixel2Mesh

A complete Pixel2Mesh implementation in PyTorch
233 stars 39 forks source link

Error in DataParallel when trying to train with multiple gpus with torch 1.11 #37

Open amirbarda opened 2 years ago

amirbarda commented 2 years ago

I am getting this error when trying to train with multiple gpus: File ./lib/python3.7/site-packages/torch/nn/parallel/replicate.py", line 71, in _broadcast_coalesced_reshape NotImplementedError: Could not run 'aten::view' with arguments from the 'SparseCUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::view' is only available for these backends: [CPU, CUDA, Meta, QuantizedCPU, QuantizedCUDA, MkldnnCPU, BackendSelect, Python, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, AutocastCPU, Autocast, Batched, VmapMode, Functionalize]. . I am trying to run the defualt tenserflow.yml experiment. It works fine on a single gpu. Evaluation works on multiple gpus.

I looked at the blocks in the module and could not find anything involving sparsity. I am using pytorch 1.11 with cuda 11.3. any idea why this is? thanks.

ultmaster commented 2 years ago

Sorry I think this repo is bit of out of maintanence. If you are not interested in digging out the reason, you can try with pytorch version mentioned in readme.