xuhuisheng / rocm-gfx803

196 stars 9 forks source link

Failed to load image Python extension:/python3.8/site-packages/torchvision/image.so: undefined symbol: #15

Closed AlfaranoAndrea closed 2 years ago

AlfaranoAndrea commented 2 years ago

Hi First of all thanks for all the support that do you give us! i'm trying to train in multigpu a torchvision project, and i'm getting this issue:

/home/fi/.local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /home/fi/.local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE

this led to GPU memory errors, with frequently computer freezing and shut down

Maybe rebuild torchvision from source can be a nice try?

xuhuisheng commented 2 years ago

How could we re-produce this issue?

AlfaranoAndrea commented 2 years ago

i found what was the issue: i built a 1.12.0 version of pytorch following your guide (https://github.com/xuhuisheng/rocm-build/tree/master/gfx803). I went back to your built torch 1.11.0 and torchvision don't give anymore this error. thanks!