k2-fsa / fast_rnnt

A torch implementation of a recursion which turns out to be useful for RNN-T.
Other
135 stars 22 forks source link

RuntimeError: Failed to find native CUDA module #33

Open scutcsq opened 5 months ago

scutcsq commented 5 months ago

RuntimeError: Failed to find native CUDA module, make sure that you compiled the code with K2_WITH_CUDA.

csukuangfj commented 5 months ago

Could you describe how you installed fast_rnnt?

scutcsq commented 5 months ago

Could you describe how you installed fast_rnnt?

I used pip to install fast_rnnt. Now I have installed the k2 and the problem is solved by using the function in k2.

bene-ges commented 5 months ago

Hi, we had the same error after the successful building fast_rnnt for AMD using Rocm 5.4 with correct installed pytorch 2.0.1 and torchaudio 0.15.2

File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/rnnt_loss.py", line 533, in rnnt_loss
    scores_and_grads = mutual_information_recursion(
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/mutual_information.py", line 294, in mutual_information_recursion
    scores = MutualInformationRecursionFunction.apply(
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubnt/anaconda3/lib/python3.8/site-packages/fast_rnnt/mutual_information.py", line 157, in forward
    ans = _fast_rnnt.mutual_information_forward(px, py, boundary, p)
RuntimeError: Failed to find native CUDA module, make sure that you compiled the code with K2_WITH_CUDA.

We want to use only fast_rnnt without k2. We installed it via build from source

git clone https://github.com/danpovey/fast_rnnt.git
cd fast_rnnt
export FT_MAKE_ARGS="-j32"
pip install --verbose fast_rnnt
bene-ges commented 5 months ago

It seems that Rocm isn't supported in the build. -- No NVCC detected. Disable CUDA support

pkufool commented 5 months ago

@bene-ges Basically if pytorch can run on Rocm, fast_rnnt can also run on it. Will have a look at this issue. Thanks!

danpovey commented 5 months ago

But the core of fast_rnnt is the CUDA code, no? And I believe Rocm does not use cuda? So would require rewrite to support that??

bene-ges commented 5 months ago

@danpovey, rocm can compile CUDA code into the amd binary. Most of projects just add the rocm compile commands like Pytorch does. So the Pytorch build system can be an example of right solution Docs

Example of conversion of CUDA code to ROCm code and its compilation (matrix-cuda is just example of cuda code) on ubuntu: git clone https://github.com/lzhengchun/matrix-cuda cd matrix-cuda /opt/rocm-5.3.0/bin/hipify-clang matrix_cuda.cu After this a file matrix_cuda.cu.hip will appear which is source code for ROCm. Then it can be compiled with HIPCC /opt/rocm-5.3.0/bin/hipсс matrix_cuda.cu.hip After this file a.out will appear

bene-ges commented 5 months ago

another useful link on porting CUDA (all notations almost identical) https://www.lumi-supercomputer.eu/preparing-codes-for-lumi-converting-cuda-applications-to-hip/

bene-ges commented 4 months ago

I can help with testing on amd if needed

danpovey commented 4 months ago

OK that's interesting. If it's possible for you to add support for ROCM into our build system (which is I think not entirely trivial), then I think we'd appreciate that very much. This kind of thing will no doubt be used more frequently in the future. (Also: apologies for the very late response.)