Open KegangWangCCNU opened 12 months ago
@rahulbatra85
@KegangWangCCNU Can you try running with this env var set TF_ROCM_USE_IMMEDIATE_MODE=1?
Since you are using jaxlib 0.4.22, I assume you built JAX yourself. Can you share where did you pull in XLA code from?
@rahulbatra85
Thank you, the issue has been temporarily resolved.
I have tried both the official and AMD branches of XLA, and they have the same problem. To my knowledge, this also occurs in earlier versions. https://github.com/google/jax/issues/14582
@KegangWangCCNU How did you solve this problem? Because I am currently troubled by this problem
This should be fixed in latest release 0.4.25 https://github.com/ROCm/jax/releases/tag/jaxlib-v0.4.25. Shouldn't need to set environment variable manually.
Confirmed fixed in 0.4.25 https://github.com/google/jax/issues/14582.
Please close this issue as well
Description
When I use jax.pmap to execute a network containing convolutions, an error is reported:
Running the model on a single GPU poses no issues.
What jax/jaxlib version are you using?
0.4.22.dev20231209+ccc8b3f7a, 0.4.22.dev20231210
Which accelerator(s) are you using?
AMD GPU
Additional system info?
1.26.2 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] uname_result(system='Linux', node='ww-server', release='6.2.0-34-generic', version='#34~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 13:12:03 UTC 2', machine='x86_64')
NVIDIA GPU info
GPU[0] : Card series: Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB] GPU[0] : Card model: 0x081e GPU[0] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI] GPU[0] : Card SKU: D1640600 GPU[1] : Card series: Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB] GPU[1] : Card model: 0x081e GPU[1] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI] GPU[1] : Card SKU: D1640600 GPU[2] : Card series: Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB] GPU[2] : Card model: 0x081e GPU[2] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI] GPU[2] : Card SKU: D1640600 GPU[3] : Card series: Vega 20 [Radeon Pro VII/Radeon Instinct MI50 32GB] GPU[3] : Card model: 0x081e GPU[3] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI] GPU[3] : Card SKU: D1640600