Unable to install pytorch scatter on AMD GPUs

I install the packages using

FORCE_ONLY_CUDA=1 pip install -U -v --no-build-isolation git+https://github.com/rusty1s/pytorch_cluster.git
FORCE_ONLY_CUDA=1 pip install -U -v --no-build-isolation git+https://github.com/rusty1s/pytorch_scatter.git
FORCE_ONLY_CUDA=1 pip install -U -v --no-build-isolation git+https://github.com/rusty1s/pytorch_sparse.git

The installation looks like it is working

Using pip 24.2 from /home/mila/rocm/results/venv/torch/lib/python3.10/site-packages/pip (python 3.10)
Collecting git+https://github.com/rusty1s/pytorch_scatter.git
  Cloning https://github.com/rusty1s/pytorch_scatter.git to /tmp/pip-req-build-qxl3qkq1
  Running command git version
  git version 2.34.1
  Running command git clone --filter=blob:none https://github.com/rusty1s/pytorch_scatter.git /tmp/pip-req-build-qxl3qkq1
  Cloning into '/tmp/pip-req-build-qxl3qkq1'...
  Running command git rev-parse HEAD
  8ec9364b0bdcd99149952a25749ad211c2d0567b
  Resolved https://github.com/rusty1s/pytorch_scatter.git to commit 8ec9364b0bdcd99149952a25749ad211c2d0567b
  Running command git rev-parse HEAD
  8ec9364b0bdcd99149952a25749ad211c2d0567b
  Preparing metadata (pyproject.toml): started
  Running command Preparing metadata (pyproject.toml)
  /tmp/pip-req-build-qxl3qkq1/csrc/macros.h -> /tmp/pip-req-build-qxl3qkq1/csrc/macros.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/extensions.h -> /tmp/pip-req-build-qxl3qkq1/csrc/extensions.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/scatter_cpu.h -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/scatter_cpu.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/utils.h -> /tmp/pip-req-build-qxl3qkq1/csrc/utils.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/scatter_cuda.h -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/scatter_hip.h [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/scatter.cpp -> /tmp/pip-req-build-qxl3qkq1/csrc/scatter_hip.cpp [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/index_info.h -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/index_info.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/reducer.h -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/reducer.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/utils.h -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/utils.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/scatter_cpu.cpp -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/scatter_cpu.cpp [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/atomics.cuh -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/atomics.cuh [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/reducer.cuh -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/reducer.cuh [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/utils.cuh -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/utils.cuh [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/scatter_cuda.cu -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/scatter_hip.hip [ok]
  [92mSuccessfully preprocessed all matching files.[0m
  Total number of unsupported CUDA function calls: 0

  Total number of replaced kernel launches: 2
  /tmp/pip-req-build-qxl3qkq1/csrc/scatter.h -> /tmp/pip-req-build-qxl3qkq1/csrc/scatter.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/version.cpp -> /tmp/pip-req-build-qxl3qkq1/csrc/version_hip.cpp [ok]
  [92mSuccessfully preprocessed all matching files.[0m
  Total number of unsupported CUDA function calls: 0

  Total number of replaced kernel launches: 0
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_coo_cpu.h -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_coo_cpu.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/segment_coo_cuda.h -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/segment_coo_hip.h [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/segment_coo.cpp -> /tmp/pip-req-build-qxl3qkq1/csrc/segment_coo_hip.cpp [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_coo_cpu.cpp -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_coo_cpu.cpp [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/segment_coo_cuda.cu -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/segment_coo_hip.hip [ok]
  [92mSuccessfully preprocessed all matching files.[0m
  Total number of unsupported CUDA function calls: 0

  Total number of replaced kernel launches: 10
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_csr_cpu.h -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_csr_cpu.h [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/segment_csr_cuda.h -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/segment_csr_hip.h [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/segment_csr.cpp -> /tmp/pip-req-build-qxl3qkq1/csrc/segment_csr_hip.cpp [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_csr_cpu.cpp -> /tmp/pip-req-build-qxl3qkq1/csrc/cpu/segment_csr_cpu.cpp [skipped, no changes]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/index_info.cuh -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/index_info.cuh [ok]
  /tmp/pip-req-build-qxl3qkq1/csrc/cuda/segment_csr_cuda.cu -> /tmp/pip-req-build-qxl3qkq1/csrc/hip/segment_csr_hip.hip [ok]
  [92mSuccessfully preprocessed all matching files.[0m
  Total number of unsupported CUDA function calls: 0

  Total number of replaced kernel launches: 4
  running dist_info
  creating /tmp/pip-modern-metadata-75rx4e2h/torch_scatter.egg-info
  writing /tmp/pip-modern-metadata-75rx4e2h/torch_scatter.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-modern-metadata-75rx4e2h/torch_scatter.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-modern-metadata-75rx4e2h/torch_scatter.egg-info/requires.txt
  writing top-level names to /tmp/pip-modern-metadata-75rx4e2h/torch_scatter.egg-info/top_level.txt
  writing manifest file '/tmp/pip-modern-metadata-75rx4e2h/torch_scatter.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-modern-metadata-75rx4e2h/torch_scatter.egg-info/SOURCES.txt'
  creating '/tmp/pip-modern-metadata-75rx4e2h/torch_scatter-2.1.2.dist-info'
  Preparing metadata (pyproject.toml): finished with status 'done'

but then when executing the code it is as if the library was installed as CPU only

recursiongfn.D0 [stdout] iteration 1 : 3.07 s, average: 3.07 s, average wait: 2.25 s, peak VRAM: 468Mb
recursiongfn.D0 [stderr] Traceback (most recent call last):
recursiongfn.D0 [stderr]   File "/home/mila/rocm/results/venv/torch/bin/voir", line 8, in <module>
recursiongfn.D0 [stderr]     sys.exit(main())
recursiongfn.D0 [stderr]   File "/home/mila/rocm/results/venv/torch/lib/python3.10/site-packages/voir/cli.py", line 128, in main
recursiongfn.D0 [stderr]     ov(sys.argv[1:] if argv is None else argv)
recursiongfn.D0 [stderr]   File "/home/mila/rocm/results/venv/torch/lib/python3.10/site-packages/voir/phase.py", line 331, in __call__
recursiongfn.D0 [stderr]     self._run(*args, **kwargs)
recursiongfn.D0 [stderr]   File "/home/mila/rocm/results/venv/torch/lib/python3.10/site-packages/voir/overseer.py", line 242, in _run
recursiongfn.D0 [stderr]     set_value(func())
recursiongfn.D0 [stderr]   File "/home/mila/rocm/results/venv/torch/lib/python3.10/site-packages/voir/scriptutils.py", line 37, in <lambda>
recursiongfn.D0 [stderr]     return lambda: exec(mainsection, glb, glb)
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/main.py", line 163, in <module>
recursiongfn.D0 [stderr]     main(
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/main.py", line 144, in main
recursiongfn.D0 [stderr]     trial.run()
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/gflownet/src/gflownet/trainer.py", line 307, in run
recursiongfn.D0 [stderr]     info = self.train_batch(batch, epoch_idx, batch_idx, it)
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/gflownet/src/gflownet/trainer.py", line 221, in train_batch
recursiongfn.D0 [stderr]     loss, info = self.algo.compute_batch_losses(self.model, batch)
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/gflownet/src/gflownet/algo/trajectory_balance.py", line 420, in compute_batch_losses
recursiongfn.D0 [stderr]     log_p_F = fwd_cat.log_prob(batch.actions)
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/gflownet/src/gflownet/envs/graph_building_env.py", line 800, in log_prob
recursiongfn.D0 [stderr]     logprobs = self.logsoftmax()
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/gflownet/src/gflownet/envs/graph_building_env.py", line 651, in logsoftmax
recursiongfn.D0 [stderr]     maxl = self._compute_batchwise_max(self.logits).values
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/gflownet/src/gflownet/envs/graph_building_env.py", line 641, in _compute_batchwise_max
recursiongfn.D0 [stderr]     maxl = [scatter_max(i, b, dim=0, out=out) for i, b, out in zip(x, batch, outs)]
recursiongfn.D0 [stderr]   File "/home/mila/milabench/benchmarks/recursiongfn/gflownet/src/gflownet/envs/graph_building_env.py", line 641, in <listcomp>
recursiongfn.D0 [stderr]     maxl = [scatter_max(i, b, dim=0, out=out) for i, b, out in zip(x, batch, outs)]
recursiongfn.D0 [stderr]   File "/home/mila/rocm/results/venv/torch/lib/python3.10/site-packages/torch_scatter/scatter.py", line 72, in scatter_max
recursiongfn.D0 [stderr]     return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size)
recursiongfn.D0 [stderr]   File "/home/mila/rocm/results/venv/torch/lib/python3.10/site-packages/torch/_ops.py", line 1061, in __call__
recursiongfn.D0 [stderr]     return self_._op(*args, **(kwargs or {}))
recursiongfn.D0 [stderr] RuntimeError: Not compiled with CUDA support

rusty1s / pytorch_scatter

Unable to install pytorch scatter on AMD GPUs #460