Closed mukundroy closed 5 years ago
Hi. The Module error is system-specific. Can you type module avail
and see which modules are installed in your system? Then modify that line in scripts/get-device.sh
with the correct name.
Alternatively, it's possible you don't need to use Module on your system. If CUDA is already pre-loaded, just delete that module load
codeblock in scripts/get-device.sh
.
As a side note, I've just added a CUDA10 version of sockeye-recipes in this branch: https://github.com/kevinduh/sockeye-recipes/tree/cuda10
Your custom install changes look fine but if you want a reference just to double-check, please see what's done in install_sockeye_custom.sh
and install_sockeye_gpu.sh
in the cuda10
branch. Note that this branch also upgrades to a newer but backwards-compatible version of Sockeye too.
Hope that helps!
Hello Sir Thanks for prompt reply. I will check and let you know. Thanks again. Regards Mukund Roy
On Fri, Mar 8, 2019 at 8:31 PM Kevin Duh notifications@github.com wrote:
Hi. The Module error is system-specific. Can you type module avail and see which modules are installed in your system? Then modify that line in scripts/get-device.sh with the correct name.
Alternatively, it's possible you don't need to use Module on your system. If CUDA is already pre-loaded, just delete that module load codeblock in scripts/get-device.sh.
As a side note, I've just added a CUDA10 version of sockeye-recipes in this branch: https://github.com/kevinduh/sockeye-recipes/tree/cuda10 Your custom install changes look fine but if you want a reference just to double-check, please see what's done in install_sockeye_custom.sh and install_sockeye_gpu.sh in the cuda10 branch. Note that this branch also upgrades to a newer but backwards-compatible version of Sockeye too.
Hope that helps!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kevinduh/sockeye-recipes/issues/25#issuecomment-470957955, or mute the thread https://github.com/notifications/unsubscribe-auth/AGIUWZNNTIlFi3Py7WmhY6eGeVxRTzjGks5vUntXgaJpZM4blTRp .
I think CUDA10 is working fine so will close this issue. If there's any other issue please let us know. Thanks!
Sir I installed CUDA 10.0 toolkit for TeslaK20 GPU card. And accordingly I changed install_sockeye_custom.sh to install requirements for it. Content of requirement_cu100.txt is pyyaml==3.12 mxnet-cu100mkl==1.3.1 numpy>=1.8.2 typing portalocker.
Installation process worked fine. But while training Error is thrown by ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'cuda100/toolkit'.
Cuda toolkit is properly installed at /usr/local/cuda-10.0
Please help what could be the possible wrong here.