hamzagamouh / protein_embeddings

2 stars 1 forks source link

cant import CUDA #4

Open ProkopDivin opened 1 year ago

ProkopDivin commented 1 year ago

there is problem with the 6 command

[divinpr@dw01 protein_embeddings]$ ls
a.001.001.001_1s69a_A.fa   compute_embeddings_gpu.sh      embeddings  requirements.txt
biopython                  compute_protein_embeddings.py  olaf.txt
compute_embeddings_cpu.sh  Dockerfile                     README.md
[divinpr@dw01 protein_embeddings]$ srun -p gpu-short --gpus=1 ch-fromhost --nvidia .
srun: error: Step requested GRES but job doesn't have GRES
srun: error: Unable to create step for job 123635: Invalid generic resource (gres) specification
[divinpr@dw01 protein_embeddings]$
hamzagamouh commented 1 year ago

@ProkopDivin I didn’t encountered this error before. Can you ask Jakub about it?

ProkopDivin commented 1 year ago

so it work for you?

ProkopDivin commented 1 year ago

and also

  1. Run ch-image build -t biopython ./biopython to create a docker image (for example here the name of the image will be "biopython"). and this isn`t right to it schould be ch-image build -t biopython .

  2. Run ch-convert biopython . to convert the docker image to a directory structure. and this: ch-convert biopython ./biopython

hamzagamouh commented 1 year ago

@ProkopDivin , Yes you're right. I will update it

hamzagamouh commented 1 year ago

@ProkopDivin ,

so it work for you?

The command runs for me.

[gamouhh@gpulab ~]$ srun -p gpu-short --gpus=1 ch-fromhost --nvidia . srun: job 123647 queued and waiting for resources srun: job 123647 has been allocated resources found shared library: /usr/lib64/libnvidia-ml.so.525.85.12 found shared library: /usr/lib64/libnvidia-cfg.so.525.85.12 found shared library: /usr/lib64/libcuda.so.525.85.12 found shared library: /usr/lib64/libcudadebugger.so.525.85.12 found shared library: /usr/lib64/libnvidia-opencl.so.525.85.12 found shared library: /usr/lib64/libnvidia-ptxjitcompiler.so.525.85.12 found shared library: /usr/lib64/libnvidia-allocator.so.525.85.12 found shared library: /usr/lib64/libnvidia-compiler.so.525.85.12 found shared library: /usr/lib64/libnvidia-nvvm.so.525.85.12 found shared library: /usr/lib64/libnvidia-ngx.so.525.85.12 found shared library: /usr/lib64/libnvidia-encode.so.525.85.12 found shared library: /usr/lib64/libnvidia-opticalflow.so.525.85.12 found shared library: /usr/lib64/libnvcuvid.so.525.85.12 found shared library: /usr/lib64/libnvidia-eglcore.so.525.85.12 found shared library: /usr/lib64/libnvidia-glcore.so.525.85.12 found shared library: /usr/lib64/libnvidia-tls.so.525.85.12 found shared library: /usr/lib64/libnvidia-glsi.so.525.85.12 found shared library: /usr/lib64/libnvidia-fbc.so.525.85.12 found shared library: /usr/lib64/libnvidia-rtcore.so.525.85.12 found shared library: /usr/lib64/libnvoptix.so.525.85.12 found shared library: /usr/lib64/libGLX_nvidia.so.525.85.12 found shared library: /usr/lib64/libEGL_nvidia.so.525.85.12 found shared library: /usr/lib64/libGLESv2_nvidia.so.525.85.12 found shared library: /usr/lib64/libGLESv1_CM_nvidia.so.525.85.12 found shared library: /usr/lib64/libnvidia-glvkspirv.so.525.85.12 asking ldconfig for shared library destination ch-run[241534]: error: can't bind: destination not found: /home/gamouhh/dev (ch_core.c:111) ch-fromhost: empty path from ldconfig srun: error: volta05: task 0: Exited with exit code 1 [gamouhh@gpulab ~]$

hamzagamouh commented 1 year ago

Can you try this one? srun -p gpu-short --gpus=1 ch-fromhost --nvidia dest_path/biopython/

ProkopDivin commented 1 year ago

ou you have to run it inside the directory with the image. i gues this mean, that it is fine, althou the output is different, but yours end with code 1 so i hope this means it is alright.

[divinpr@gpulab ~]$ cd pbsprediction/protein_embeddings/biopython/
[divinpr@gpulab biopython]$ srun -p gpu-short --gpus=1 ch-fromhost --nvidia .
srun: job 123664 queued and waiting for resources
srun: job 123664 has been allocated resources
found shared library: /usr/lib64/libnvidia-ml.so.525.85.12
found shared library: /usr/lib64/libnvidia-cfg.so.525.85.12
found shared library: /usr/lib64/libcuda.so.525.85.12
found shared library: /usr/lib64/libcudadebugger.so.525.85.12
found shared library: /usr/lib64/libnvidia-opencl.so.525.85.12
found shared library: /usr/lib64/libnvidia-ptxjitcompiler.so.525.85.12
found shared library: /usr/lib64/libnvidia-allocator.so.525.85.12
found shared library: /usr/lib64/libnvidia-compiler.so.525.85.12
found shared library: /usr/lib64/libnvidia-nvvm.so.525.85.12
found shared library: /usr/lib64/libnvidia-ngx.so.525.85.12
found shared library: /usr/lib64/libnvidia-encode.so.525.85.12
found shared library: /usr/lib64/libnvidia-opticalflow.so.525.85.12
found shared library: /usr/lib64/libnvcuvid.so.525.85.12
found shared library: /usr/lib64/libnvidia-eglcore.so.525.85.12
found shared library: /usr/lib64/libnvidia-glcore.so.525.85.12
found shared library: /usr/lib64/libnvidia-tls.so.525.85.12
found shared library: /usr/lib64/libnvidia-glsi.so.525.85.12
found shared library: /usr/lib64/libnvidia-fbc.so.525.85.12
found shared library: /usr/lib64/libnvidia-rtcore.so.525.85.12
found shared library: /usr/lib64/libnvoptix.so.525.85.12
found shared library: /usr/lib64/libGLX_nvidia.so.525.85.12
found shared library: /usr/lib64/libEGL_nvidia.so.525.85.12
found shared library: /usr/lib64/libGLESv2_nvidia.so.525.85.12
found shared library: /usr/lib64/libGLESv1_CM_nvidia.so.525.85.12
found shared library: /usr/lib64/libnvidia-glvkspirv.so.525.85.12
asking ldconfig for shared library destination
/sbin/ldconfig: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
shared library destination: /usr/local/lib
injecting into image: .
  /usr/bin/nvidia-smi -> /usr/bin (inferred)
  /usr/bin/nvidia-debugdump -> /usr/bin (inferred)
  /usr/bin/nvidia-persistenced -> /usr/bin (inferred)
  /usr/bin/nvidia-cuda-mps-control -> /usr/bin (inferred)
  /usr/bin/nvidia-cuda-mps-server -> /usr/bin (inferred)
  /usr/lib64/libnvidia-ml.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-cfg.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libcuda.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libcudadebugger.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-opencl.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-ptxjitcompiler.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-allocator.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-compiler.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-nvvm.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-ngx.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-encode.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-opticalflow.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvcuvid.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-eglcore.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-glcore.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-tls.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-glsi.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-fbc.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-rtcore.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvoptix.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libGLX_nvidia.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libEGL_nvidia.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libGLESv2_nvidia.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libGLESv1_CM_nvidia.so.525.85.12 -> /usr/local/lib (inferred)
  /usr/lib64/libnvidia-glvkspirv.so.525.85.12 -> /usr/local/lib (inferred)
running ldconfig
hamzagamouh commented 1 year ago

@ProkopDivin It's alright, I just didn't run it inside the directory because I don't have the image locally. The output that I showed you was just for illustration purposes.