xuhuisheng / rocm-gfx803

185 stars 9 forks source link

Is there any instructions for build torchvision #17

Open ZeroNSS opened 1 year ago

ZeroNSS commented 1 year ago

Thank you for your hard work, it helps me a lot. I'm trying to run a Stable Diffusion project(AUTOMATIC1111/stable-diffusion-webui) on my gfx803(RX580),I use your built rocblas, pytorch and torchvision. It works fine at first, but after the project is updated, it force me to use pytorch 1.12+, if I keep use pytorch 1.11, it will faild at load models. So I follow your guide about navi10(https://github.com/xuhuisheng/rocm-build/blob/master/navi10/README.md) to build a pytorch 1.12.1 for my gfx803. But is seems have some problems with torchvision, if I kepp use your built torchvision, it will occur an error: /..../lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /..../lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE If I change the torchvision to torchvision-0.13.1+rocm5.1.1-cp39-cp39-linux_x86_64 the error code is: _ZN3c106detail23torchInternalAssertFailEPKcS2_js2_RKSs If I use pip install torchvision==0.13.1 to install torchvision, the error is: Failed to load image Python extension: libc10_cuda.so: cannot open shared object file: No such file or directory. And all these three vision have problems in generate images, There is a greater chance of producing colored lines in the generated image, like screen tearing. And it is most serious in img2img mode. So I wonder if I need to build a torchvision for my environment, Whether special parameters need to be set when compiling, or just follow the official guide. My environment is: Ubuntu-20.04.5(5.13.0-35-generic) RX580 ROCm 5.2.0 I'll upload screenshots of the error later.

xuhuisheng commented 1 year ago

Run setup.py should OK.

USE_ROCM=1 USE_NUMPY=1 python3 setup.py bdist_wheel
ZeroNSS commented 1 year ago

Thank you for your help. I have follow you instruction to build a torchvision, after install it, there is no error report when I launch stable-diffusion-webui. But when I generate image, there still a bug. 2022-10-11 20-02-45 (the images generated by GPU in img2img mode) To figure out is the problem of the code or not, I switch to CPU mode regenerate the images with same parameters. 2022-10-11 20-03-39 (the images generated by CPU in img2img mode) You can see it works correctly on CPU,but very slow, it took me ten times as long to generate the image. And when I run it on GPU, it will sometimes causes the computer freeze. Seems the pytorch I built will causes GPU memory errors. I will try to build it again.