kevinduh / sockeye-recipes

Training scripts and recipes for Sockeye Neural Machine Translation toolkit
37 stars 18 forks source link

Unable to locate a modulefile for 'cuda100/toolkit' #25

Closed mukundroy closed 5 years ago

mukundroy commented 5 years ago

Sir I installed CUDA 10.0 toolkit for TeslaK20 GPU card. And accordingly I changed install_sockeye_custom.sh to install requirements for it. Content of requirement_cu100.txt is pyyaml==3.12 mxnet-cu100mkl==1.3.1 numpy>=1.8.2 typing portalocker.

Installation process worked fine. But while training Error is thrown by ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'cuda100/toolkit'.

Cuda toolkit is properly installed at /usr/local/cuda-10.0

Please help what could be the possible wrong here.

kevinduh commented 5 years ago

Hi. The Module error is system-specific. Can you type module avail and see which modules are installed in your system? Then modify that line in scripts/get-device.sh with the correct name.

Alternatively, it's possible you don't need to use Module on your system. If CUDA is already pre-loaded, just delete that module load codeblock in scripts/get-device.sh.

As a side note, I've just added a CUDA10 version of sockeye-recipes in this branch: https://github.com/kevinduh/sockeye-recipes/tree/cuda10 Your custom install changes look fine but if you want a reference just to double-check, please see what's done in install_sockeye_custom.sh and install_sockeye_gpu.sh in the cuda10 branch. Note that this branch also upgrades to a newer but backwards-compatible version of Sockeye too.

Hope that helps!

mukundroy commented 5 years ago

Hello Sir Thanks for prompt reply. I will check and let you know. Thanks again. Regards Mukund Roy

On Fri, Mar 8, 2019 at 8:31 PM Kevin Duh notifications@github.com wrote:

Hi. The Module error is system-specific. Can you type module avail and see which modules are installed in your system? Then modify that line in scripts/get-device.sh with the correct name.

Alternatively, it's possible you don't need to use Module on your system. If CUDA is already pre-loaded, just delete that module load codeblock in scripts/get-device.sh.

As a side note, I've just added a CUDA10 version of sockeye-recipes in this branch: https://github.com/kevinduh/sockeye-recipes/tree/cuda10 Your custom install changes look fine but if you want a reference just to double-check, please see what's done in install_sockeye_custom.sh and install_sockeye_gpu.sh in the cuda10 branch. Note that this branch also upgrades to a newer but backwards-compatible version of Sockeye too.

Hope that helps!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kevinduh/sockeye-recipes/issues/25#issuecomment-470957955, or mute the thread https://github.com/notifications/unsubscribe-auth/AGIUWZNNTIlFi3Py7WmhY6eGeVxRTzjGks5vUntXgaJpZM4blTRp .

kevinduh commented 5 years ago

I think CUDA10 is working fine so will close this issue. If there's any other issue please let us know. Thanks!