taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.09k stars 2.26k forks source link

Is it possible to install Taichi on Centos7 with Rocm backend? #4586

Open lql341 opened 2 years ago

lql341 commented 2 years ago

I am trying to use Taichi on Centos. The code complains "ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found ". This has to be the lower version of libc6.so. So my question is:

  1. Is there a solution for Centos 7?
  2. Or, maybe Taichi is not suitable for a supercomputing system without GUI? I notice that it highly replies on the Graphics.

Thank you in advance.

qiao-bo commented 2 years ago

Hi, we do have a Centos 7 compatible package. Did you do pip install? can you post the name of the wheel gets downloaded? cc @strongoier

lql341 commented 2 years ago

image This is output when I run " python3 -m pip install taichi"

But when I run "python3 fractal.py", it reports low version string error of libm.so.6.

image

strongoier commented 2 years ago

Hi @lql341. We have made a wheel compatible with CentOS 7 after the release of v0.9.1, so you have to wait for v0.9.2 (hopefully this week) to officially get it. Alternatively, you can install our nightly version via pip install -i https://test.pypi.org/simple/ taichi-nightly. The problem here is that our nightly releases only contain Python 3.8/3.10 wheels, so you may need to switch your Python version if possible.

lql341 commented 2 years ago

I switched to python 3.8, but it is missing some requirement files? image

strongoier commented 2 years ago

There should be a space between https://test.pypi.org/simple/ and taichi-nightly.

strongoier commented 2 years ago

@lql341 Have you succeeded or met other problems?

lql341 commented 2 years ago

Sorry for the late response. I have successfully installed taichi-nightly-20220320. The libm.so.6 error has gone.

But I ran into some other problems indeed. I am testing "fractal.py (hello world)" on the Sugon(曙光) super-computing platform, which has DCU ( a gpu similar to AMD's ) with ROCM backend. Any suggestions to how to get Taichi running properly?

image

strongoier commented 2 years ago
  1. Taichi hasn't supported ROCm backend yet. (related issue: #412)
  2. You can try CPU backend first. Could you change the code according to the message, i.e., use ti.GUI(show_gui=False)?
lql341 commented 2 years ago
  1. Taichi hasn't supported ROCm backend yet. (related issue: AMDGPU backend #412)

    1. You can try CPU backend first. Could you change the code according to the message, i.e., use ti.GUI(show_gui=False)?
  1. CPU backend works fine.
  2. I saw that issue before posting this one. I thought Taichi may have some progress on AMD Drivers after two years. Haha. Do you have plans for developing ROCM backend?

Thank you.

bobcao3 commented 2 years ago

IIRC if it has the full amdgpu-pro stack, you should have access to Vulkan. Vulkan should be the best backend the AMD gpus run on. Due to some glibc versioning issue the CentOS 7 package is CPU-only, so you would need to build from source to get Vulkan. We are considering adding ROCm support if there is a good reason and large enough demand.

bobcao3 commented 2 years ago

@lql341 If you can get Vulkan to work on your side with building from source please post the steps you took so that we can try to add that capability officially! Thanks!

lql341 commented 2 years ago

@lql341 If you can get Vulkan to work on your side with building from source please post the steps you took so that we can try to add that capability officially! Thanks!

Actually I have tried to build Vulkan from source, but it complains as follows: image

Ps: we can discuss how to collaborate and implement Taichi on Hygon DCU platform if you are interested.

bobcao3 commented 2 years ago

Hello! Is it required to use CentOS 7? Is Ubuntu 20.04 LTS an option? The reason why we did not release gpu backends on CentOS is due to the older version of glibc that it ships with.

lql341 commented 2 years ago

Hello! Is it required to use CentOS 7? Is Ubuntu 20.04 LTS an option? The reason why we did not release gpu backends on CentOS is due to the older version of glibc that it ships with.

I am running Taichi on a super-computing system, so the base system has to be Centos 7. But Another possibility is to run Taichi in a Ubuntu container.

turbo0628 commented 2 years ago

Not sure if DPU's compatible with containers. Can you get vulkan work properly in the Ubuntu container?

This path without mature driver support is risky even when vulkaninfo runs properly. You might still run into strange driver related issues that are hard to debug.

How are you using DPUs currently? It is possible to extend to AMDGPUs with LLVM. This is how Taichi's CPU and CUDA backend works. However, I don't see DPU support in LLVM doc.

lql341 commented 2 years ago

Not sure if DPU's compatible with containers. Can you get vulkan work properly in the Ubuntu container?

This path without mature driver support is risky even when vulkaninfo runs properly. You might still run into strange driver related issues that are hard to debug.

How are you using DPUs currently? It is possible to extend to AMDGPUs with LLVM. This is how Taichi's CPU and CUDA backend works. However, I don't see DPU support in LLVM doc.

Hi, I am using DCU, which shares the same backend (ROCM)with AMD GPU but made by Hygon. And since I am trying to test Taichi on a super-computing system, my account is non-root user by default. So messing up with GPU Drivers can be very difficult, I am trying to run Taichi in a Ubuntu container.

Another question, do you think it is necessary to run Taichi on a super-computing system, or a good PC can get all work done? I am not sure how many computing resources Taichi needs.

bobcao3 commented 2 years ago

@lql341 You can just use a regular PC. We have CPU backends as well, and if you have a discrete GPU that would be best. We don't support multi GPU at this point so a super computing system might not help. However we are interested in supporting DCUs, we shall contact you later regarding that.

lql341 commented 2 years ago

@lql341 You can just use a regular PC. We have CPU backends as well, and if you have a discrete GPU that would be best. We don't support multi GPU at this point so a super computing system might not help. However we are interested in supporting DCUs, we shall contact you later regarding that.

Okay. I am very interested in supporting DCUs.

lql341 commented 2 years ago

@lql341 You can just use a regular PC. We have CPU backends as well, and if you have a discrete GPU that would be best. We don't support multi GPU at this point so a super computing system might not help. However we are interested in supporting DCUs, we shall contact you later regarding that.

Is there a better way that we can discuss the possibility of the implementation on DCU? Say email, Wechat, whatever...

Taichi-contributor commented 2 years ago

We may have further discussions in the slack channel. Please send an email to community@taichi.graphics then we will send an invitation link to you. Thanks.