kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.24k stars 5.32k forks source link

CPU runtime without CUDA drivers no longer working for CUDA build. #4576

Closed KarelVesely84 closed 3 years ago

KarelVesely84 commented 3 years ago

Hello, in the past kaldi built with CUDA worked on a machine without CUDA driver installed when running CPU only code. This seems to be no longer the case since the adoption of this change : https://github.com/kaldi-asr/kaldi/commit/2ad159073f0fb1e57b1b89b0e534ee9c91738367

Would it be possible to revert the change ? It is causing problems for our update of kaldi to the current version...

Best regards Karel

danpovey commented 3 years ago

What happens when you try to use this version?

On Tue, Jun 22, 2021 at 7:42 PM Karel Vesely @.***> wrote:

Hello, in the past kaldi built with CUDA worked on a machine without CUDA driver installed when running CPU only code. This seems to be no longer the case since, since the adoption of this change : 2ad1590 https://github.com/kaldi-asr/kaldi/commit/2ad159073f0fb1e57b1b89b0e534ee9c91738367

Would it be possible to revert the change ? It is causing problems for our update of kaldi to the current version...

Best regards Karel

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOY63H7SVWTQSIR3ESLTUBZKDANCNFSM47DQIGMQ .

galv commented 3 years ago

Can you let me know how you are able to handle dependencies like libcufft, libcusparse, and libcublas? It would seem to me that you are going to run into linking issues for those if you build with cuda but try to run on a CPU-only machine. Are you statically linking them?

I'm also curious about what specific error you get, if you were to copy /usr/local/cuda/lib64/stubs/libcuda.so into your Dockerfile and add it to the LD_LIBRARY_PATH. I believe other libraries (like tensorflow) also use the dlopen() and dlsym() methods of loading libcuda.so as well, so I'm not opposed to your suggestion. I am just skeptical of making the change unnecessarily without checking if the stub libraries have a non-broken enough fake implementation to make things work.

KarelVesely84 commented 3 years ago

we alredy found a workaround, but the original idea of calling certain functions from libcuda.so via dlopen was to be able to recover from the situation when the libcuda.so was not available (i.e. the CUDA driver was not installed on the machine where it was running) but that was already years ago...

KarelVesely84 commented 3 years ago

libcufft, libcusparse, and libcublas

all these come from the cuda toolkit. and they are not depending on libcuda.so. it is easy to install cuda toolkit to machine without GPU,

but installing cuda driver to machine without GPU is already going too far to allow running a kaldi build with CUDA... and libcuda.so is part of the cuda driver

KarelVesely84 commented 3 years ago

so, the new version should behave about the same ? (but now with LD_LIBRARY_PATH=$(CUDATKDIR)/lib64/stubs if libcuda.so is missing)

aha, I tried it that way and it worked...

thanks!