hamzagamouh / protein_embeddings

2 stars 1 forks source link

can`t find image #1

Open ProkopDivin opened 1 year ago

ProkopDivin commented 1 year ago

after running (only difference from guid is that I named image "pbiopython" insted of "biopython")

  1. Clone the repository git clone https://github.com/hamzagamouh/protein_embeddings.git
  2. Run cd protein_embeddings to go to the repo directory (where a dockerFile is stored)
  3. Run salloc -C docker to switch to a node where docker is installed.
  4. Run ch-image build -t pbiopython . to create a docker image (for example here the name of the image will be "biopython").

everything seems to be alright. Many packages are downloaded and installed. There are only some warnings about possible pip update and recomended venv.

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 23.0 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
grown in 5 instructions: pbiopython
build slow? consider enabling the new build cache
hint: https://hpc.github.io/charliecloud/command-usage.html#build-cache

The problem is that i can not find the image named "pbiopython". I supose that it schoud be in the "protein_embeddings" directory. I also run comand sudo docker images and there isn`t any image.

So after running:

  1. Run ch-convert -i docker pbiopython . to convert the docker image to a directory structure.

i get:

input:   docker    pbiopython
output:  ch-image  .
error: source not found in Docker storage: pbiopython
hamzagamouh commented 1 year ago

Hello @ProkopDivin, Thank you for your feedback. I can see a typo there in pbiopython which should be biopython. In fact, it doesn't matter. The thing is that this is the name of the docker image, you can name it anyway you like, but you should maintain the same name in other scripts as well.

ProkopDivin commented 1 year ago

thank you for your response @hamzagamouh . I used everytime name "pbipython" insted of "biopython", so i thing it should not be the problem. Just to be sure a try it again and the result is the same . I runnig the commands in terminal, but I thing this should not be the problem.

there is what i get from the terminal:

[divinpr@dw01 protein_embeddings]$ ch-image build -t biopython .
  1. FROM python:3.7.13-slim-buster
copying image ...
available --force: debderiv: Debian 9+, Ubuntu 14+, or other derivative
  2. WORKDIR /app
  4. COPY ['.'] -> '.'
  6. RUN cd /app
  8. RUN pip install -r requirements.txt
Collecting transformers==4.20.1
  Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 25.9 MB/s eta 0:00:00
Collecting bio-embeddings[transformers]==0.2.2
  Downloading bio_embeddings-0.2.2-py3-none-any.whl (105 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.4/105.4 KB 12.7 MB/s eta 0:00:00
Collecting biopython==1.79
  Downloading biopython-1.79-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 33.4 MB/s eta 0:00:00
Collecting numpy==1.21.5
  Downloading numpy-1.21.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 27.1 MB/s eta 0:00:00
Collecting pandas==1.3.5
  Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 33.2 MB/s eta 0:00:00
Collecting requests
  Downloading requests-2.28.2-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 KB 11.8 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 190.3/190.3 KB 21.6 MB/s eta 0:00:00
Collecting importlib-metadata
  Downloading importlib_metadata-6.0.0-py3-none-any.whl (21 kB)
Collecting tqdm>=4.27
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 12.9 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 596.3/596.3 KB 29.8 MB/s eta 0:00:00
Collecting packaging>=20.0
  Downloading packaging-23.0-py3-none-any.whl (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.7/42.7 KB 8.3 MB/s eta 0:00:00
Collecting regex!=2019.12.17
  Downloading regex-2022.10.31-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 757.1/757.1 KB 32.4 MB/s eta 0:00:00
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 33.6 MB/s eta 0:00:00
Collecting appdirs<2.0.0,>=1.4.4
  Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting ruamel.yaml<0.18.0,>=0.17.10
  Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.5/109.5 KB 17.3 MB/s eta 0:00:00
Collecting humanize<4.0.0,>=3.2.0
  Downloading humanize-3.14.0-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.4/98.4 KB 16.9 MB/s eta 0:00:00
Collecting h5py<4.0.0,>=3.2.1
  Downloading h5py-3.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 32.5 MB/s eta 0:00:00
Collecting plotly<6.0.0,>=5.1.0
  Downloading plotly-5.13.0-py2.py3-none-any.whl (15.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.2/15.2 MB 30.3 MB/s eta 0:00:00
Collecting gensim<4.0.0,>=3.8.2
  Downloading gensim-3.8.3-cp37-cp37m-manylinux1_x86_64.whl (24.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.2/24.2 MB 22.7 MB/s eta 0:00:00
Collecting importlib-metadata
  Downloading importlib_metadata-4.13.0-py3-none-any.whl (23 kB)
Collecting scikit-learn<0.25.0,>=0.24.0
  Downloading scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.3/22.3 MB 24.4 MB/s eta 0:00:00
Collecting scipy<2.0.0,>=1.4.1
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.1/38.1 MB 18.9 MB/s eta 0:00:00
Collecting lock<2019.0.0,>=2018.3.25
  Downloading lock-2018.3.25.2110.tar.gz (3.0 kB)
  Preparing metadata (setup.py) ... done
Collecting umap-learn<0.6.0,>=0.5.1
  Downloading umap-learn-0.5.3.tar.gz (88 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.2/88.2 KB 13.3 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting atomicwrites<2.0.0,>=1.4.0
  Downloading atomicwrites-1.4.1.tar.gz (14 kB)
  Preparing metadata (setup.py) ... done
Collecting torch<=1.10.0,>=1.8.0
  Downloading torch-1.10.0-cp37-cp37m-manylinux1_x86_64.whl (881.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 881.9/881.9 MB 1.1 MB/s eta 0:00:00
Collecting python-slugify<6.0.0,>=5.0.2
  Downloading python_slugify-5.0.2-py2.py3-none-any.whl (6.7 kB)
Collecting matplotlib<4.0.0,>=3.2.1
  Downloading matplotlib-3.5.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.2/11.2 MB 11.6 MB/s eta 0:00:00
Collecting pytz>=2017.3
  Downloading pytz-2022.7.1-py2.py3-none-any.whl (499 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 499.4/499.4 KB 997.6 kB/s eta 0:00:00
Collecting python-dateutil>=2.7.3
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 23.5 MB/s eta 0:00:00
Collecting six>=1.5.0
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting smart-open>=1.8.1
  Downloading smart_open-6.3.0-py3-none-any.whl (56 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.8/56.8 KB 10.9 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting zipp>=0.5
  Downloading zipp-3.12.0-py3-none-any.whl (6.6 kB)
Collecting pillow>=6.2.0
  Downloading Pillow-9.4.0-cp37-cp37m-manylinux_2_28_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 34.8 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
  Downloading fonttools-4.38.0-py3-none-any.whl (965 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 965.4/965.4 KB 34.0 MB/s eta 0:00:00
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting pyparsing>=2.2.1
  Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 KB 16.6 MB/s eta 0:00:00
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.4.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 29.4 MB/s eta 0:00:00
Collecting tenacity>=6.2.0
  Downloading tenacity-8.1.0-py3-none-any.whl (23 kB)
Collecting text-unidecode>=1.3
  Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.2/78.2 KB 14.0 MB/s eta 0:00:00
Collecting ruamel.yaml.clib>=0.2.6
  Downloading ruamel.yaml.clib-0.2.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (500 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 500.1/500.1 KB 29.5 MB/s eta 0:00:00
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting joblib>=0.11
  Downloading joblib-1.2.0-py3-none-any.whl (297 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 298.0/298.0 KB 27.3 MB/s eta 0:00:00
Collecting numba>=0.49
  Downloading numba-0.56.4-cp37-cp37m-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 34.4 MB/s eta 0:00:00
Collecting pynndescent>=0.5
  Downloading pynndescent-0.5.8.tar.gz (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 34.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting certifi>=2017.4.17
  Downloading certifi-2022.12.7-py3-none-any.whl (155 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 KB 20.8 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.4-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 KB 11.4 MB/s eta 0:00:00
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.6/140.6 KB 20.1 MB/s eta 0:00:00
Collecting charset-normalizer<4,>=2
  Downloading charset_normalizer-3.0.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (170 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 170.5/170.5 KB 19.8 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from numba>=0.49->umap-learn<0.6.0,>=0.5.1->bio-embeddings[transformers]==0.2.2->-r requirements.txt (line 2)) (57.5.0)
Collecting llvmlite<0.40,>=0.39.0dev0
  Downloading llvmlite-0.39.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34.6/34.6 MB 20.0 MB/s eta 0:00:00
Building wheels for collected packages: atomicwrites, lock, umap-learn, pynndescent
  Building wheel for atomicwrites (setup.py) ... done
  Created wheel for atomicwrites: filename=atomicwrites-1.4.1-py2.py3-none-any.whl size=6957 sha256=b85bae8fc96b9899d1b5e74eec8e44a1c9b02f0ba188d6e8692d95ea23999f48
  Stored in directory: /root/.cache/pip/wheels/0d/a9/a0/39edfadae620db443c05b5df0f6d5caad7411cf86b821790a6
  Building wheel for lock (setup.py) ... done
  Created wheel for lock: filename=lock-2018.3.25.2110-py3-none-any.whl size=3318 sha256=beff314c7c0d51243ca08378c4a2efb7162f3e47cc7f26c76ce30179ffe79de7
  Stored in directory: /root/.cache/pip/wheels/80/70/80/f2d0dbe94130ae6eee956436fb20d3920649588cc39c892206
  Building wheel for umap-learn (setup.py) ... done
  Created wheel for umap-learn: filename=umap_learn-0.5.3-py3-none-any.whl size=82829 sha256=b3409bcf916bdc8d2abc9746f9692897cff82bd36a6f8c600e7ea654819820ad
  Stored in directory: /root/.cache/pip/wheels/b3/52/a5/1fd9e3e76a7ab34f134c07469cd6f16e27ef3a37aeff1fe821
  Building wheel for pynndescent (setup.py) ... done
  Created wheel for pynndescent: filename=pynndescent-0.5.8-py3-none-any.whl size=55512 sha256=57e84574f6392f1074c6d8f3ae865cfe581f6d69413e2437f9adf3bc081ef2d8
  Stored in directory: /root/.cache/pip/wheels/19/bc/eb/974072a56a7082a302f8b4be1ad6d21bf5019235c2eff65928
Successfully built atomicwrites lock umap-learn pynndescent
Installing collected packages: tokenizers, text-unidecode, pytz, lock, charset-normalizer, appdirs, zipp, urllib3, typing-extensions, tqdm, threadpoolctl, tenacity, smart-open, six, ruamel.yaml.clib, regex, pyyaml, python-slugify, pyparsing, pillow, packaging, numpy, llvmlite, joblib, idna, fonttools, filelock, cycler, certifi, atomicwrites, torch, scipy, ruamel.yaml, requests, python-dateutil, plotly, kiwisolver, importlib-metadata, h5py, biopython, scikit-learn, pandas, numba, matplotlib, humanize, huggingface-hub, gensim, transformers, pynndescent, umap-learn, bio-embeddings
Successfully installed appdirs-1.4.4 atomicwrites-1.4.1 bio-embeddings-0.2.2 biopython-1.79 certifi-2022.12.7 charset-normalizer-3.0.1 cycler-0.11.0 filelock-3.9.0 fonttools-4.38.0 gensim-3.8.3 h5py-3.8.0 huggingface-hub-0.12.0 humanize-3.14.0 idna-3.4 importlib-metadata-4.13.0 joblib-1.2.0 kiwisolver-1.4.4 llvmlite-0.39.1 lock-2018.3.25.2110 matplotlib-3.5.3 numba-0.56.4 numpy-1.21.5 packaging-23.0 pandas-1.3.5 pillow-9.4.0 plotly-5.13.0 pynndescent-0.5.8 pyparsing-3.0.9 python-dateutil-2.8.2 python-slugify-5.0.2 pytz-2022.7.1 pyyaml-6.0 regex-2022.10.31 requests-2.28.2 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 scikit-learn-0.24.2 scipy-1.7.3 six-1.16.0 smart-open-6.3.0 tenacity-8.1.0 text-unidecode-1.3 threadpoolctl-3.1.0 tokenizers-0.12.1 torch-1.10.0 tqdm-4.64.1 transformers-4.20.1 typing-extensions-4.4.0 umap-learn-0.5.3 urllib3-1.26.14 zipp-3.12.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 23.0 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
grown in 5 instructions: biopython
build slow? consider enabling the new build cache
hint: https://hpc.github.io/charliecloud/command-usage.html#build-cache
[divinpr@dw01 protein_embeddings]$ ls
compute_embeddings_cpu.sh  compute_protein_embeddings.py  README.md
compute_embeddings_gpu.sh  Dockerfile                     requirements.txt
[divinpr@dw01 protein_embeddings]$ sudo docker images
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE
[divinpr@dw01 protein_embeddings]$ ch-convert -i docker biopython .
input:   docker    biopython
output:  ch-image  .
error: source not found in Docker storage: biopython
[divinpr@dw01 protein_embeddings]$
hamzagamouh commented 1 year ago

@ProkopDivin Mmm I see, I will think about it and let you know in the afternoon. Because I cannot see that the image was created in docker.

ProkopDivin commented 1 year ago

@hamzagamouh is there some news, did you try it your self? In case you don`t , could you please? I want to know, if the problem happend only on my site.

hamzagamouh commented 1 year ago

Hello @ProkopDivin , I am sorry for late reply, I was very busy last week. Actually, I don’t know where is the problem, because I think that this is a problem on the cluster’s side. I think it would be better if we schedule a call, and walk through it together. I am available after 19:00 (Prague Time). What do you think?

ProkopDivin commented 1 year ago

Hi, @hamzagamouh , unfortunately I am not available after 19:00 (Prague Time) today or tomorrow. here is when i have time: (Prague Time) wednesday: until - 15:00 thurstday: until -12:00 friday : 16:00 - and after

hamzagamouh commented 1 year ago

@ProkopDivin I can do it on Friday. Or you can ask Jakub Yaghob (jakub.yaghob@matfyz.cuni.cz) for this issue. He is the manager of the cluster.

ProkopDivin commented 1 year ago

@hamzagamouh Since it look like the problem is in clusters site, I will ask Jakob and let you know, how it end up.

yaghobvtm commented 1 year ago

Hello mates, I am afraid, you are mixing Docker and CharlieCloud images together. Using the command ch-image build WITHOUT any parameter builds an internal CharlieCloud image in its host local cache. You can list all available CharlieCloud images using ch-build list command. You were trying to list Docker images, but you weren't building Docker image, you have built CharlieCloud image. Then use ch-convert WITHOUT parameter -i docker and it will be fine. Something like ch-convert pbiopython ./my_output_dir.

ProkopDivin commented 1 year ago

@yaghobvtm, Thank you very much.

ProkopDivin commented 1 year ago

@hamzagamouh step 5 still not work

[divinpr@dw01 protein_embeddings]$ ch-convert biopython .
input:   ch-image  biopython
output:  ch-image  .
error: input and output formats must be different

because you have to specify the output directory and cant change change current directory to the directry with the image

hamzagamouh commented 1 year ago

@ProkopDivin What is the output of ls?

hamzagamouh commented 1 year ago

@ProkopDivin Oh I see, I can run it myself and you need to specify the name of the directory that is gonna host the image. If you try ch-convert biopython ./biopython it's gonna work (it will take sometime). Please try this and let me know

ProkopDivin commented 1 year ago

yes, that work

hamzagamouh commented 1 year ago

Do you have some time for a short meet? Let's do the full debugging together until everything works nicely.

ProkopDivin commented 1 year ago

Im sorry I didn`t notice this message. I thing I spot some other mistakes and I know how to correct them. first one: sbatch --job-name job_name --output job_name.txt --emb_name bert --input_dataset datasets/dataset.csv --output_folder dest compute_embeddings_gpu.sh in this you pass this arguments as a options --emb_name bert --input_dataset datasets/dataset.csv --output_folder dest for sbatch but this arguments are suposed to be for bash script so the order schould be:

sbatch --job-name job_name --output job_name.txt dest compute_embeddings_gpu.sh --emb_name bert --input_dataset datasets/dataset.csv --output_folder

and second in: compute_embeddings_gpu.sh you run the python script like this srun ch-run --bind /home/gamouhh/files:/app/output biopython -- python /app/compute_protein_embeddings.py --emb_name $emb_name --input_dataset $input_dataset --output_folder $output_folder but srun is for an interactive job so i think it dont have to be there and it is ok like this: ch-run --bind /home/gamouhh/files:/app/output biopython -- python /app/compute_protein_embeddings.py --emb_name $emb_name --input_dataset $input_dataset --output_folder $output_folder

now when I try to run it I have this error:

[divinpr@gpulab protein_embeddings]$ squeue -u divinpr
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[divinpr@gpulab protein_embeddings]$ sbatch --job-name job --output out.txt compute_embeddings_gpu.sh  --emb_name bert --input_dataset a.001.001.001_1s69a_A.fa  --output_folder ~/pbsprediction/protein_embeddings/embeddings
Submitted batch job 123789
[divinpr@gpulab protein_embeddings]$ squeue -u divinpr
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            123789  gpu-long      job  divinpr  R       0:02      2 volta[01-02]
[divinpr@gpulab protein_embeddings]$squeue -u divinpr
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[divinpr@gpulab protein_embeddings]$ cat out.txt
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-xt3ke_l4 because the default path (/home/divinpr/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Import embedder...
Traceback (most recent call last):
  File "/app/output/compute_protein_embeddings.py", line 68, in <module>
    EMBEDDER=get_embedder(emb_name)
  File "/app/output/compute_protein_embeddings.py", line 42, in get_embedder
    from bio_embeddings.embed.prottrans_bert_bfd_embedder import ProtTransBertBFDEmbedder
  File "/usr/local/lib/python3.7/site-packages/bio_embeddings/__init__.py", line 14, in <module>
    import bio_embeddings.project
  File "/usr/local/lib/python3.7/site-packages/bio_embeddings/project/__init__.py", line 5, in <module>
    from bio_embeddings.project.umap import umap_reduce
  File "/usr/local/lib/python3.7/site-packages/bio_embeddings/project/umap.py", line 1, in <module>
    from umap import UMAP
  File "/usr/local/lib/python3.7/site-packages/umap/__init__.py", line 2, in <module>
    from .umap_ import UMAP
  File "/usr/local/lib/python3.7/site-packages/umap/umap_.py", line 41, in <module>
    from umap.layouts import (
  File "/usr/local/lib/python3.7/site-packages/umap/layouts.py", line 37, in <module>
    "dim": numba.types.intp,
  File "/usr/local/lib/python3.7/site-packages/numba/core/decorators.py", line 212, in wrapper
    disp.enable_caching()
  File "/usr/local/lib/python3.7/site-packages/numba/core/dispatcher.py", line 863, in enable_caching
    self._cache = FunctionCache(self.py_func)
  File "/usr/local/lib/python3.7/site-packages/numba/core/caching.py", line 601, in __init__
    self._impl = self._impl_class(py_func)
  File "/usr/local/lib/python3.7/site-packages/numba/core/caching.py", line 338, in __init__
    "for file %r" % (qualname, source_path))
RuntimeError: cannot cache function 'rdist': no locator available for file '/usr/local/lib/python3.7/site-packages/umap/layouts.py'

the contend of compute_embeddings_gpu.sh is

#!/bin/bash
#SBATCH --partition=gpu-long        # partition you want to run job in
#SBATCH --gpus=3
#SBATCH --time=7-00:00:00         # walltime for the job in format (days-)hours:minutes:seconds
#SBATCH --mail-user=hamza.gamouh@gmail.com --mail-type=END,FAIL     # send email when job changes state to email address

export LD_LIBRARY_PATH=/usr/local/cuda/lib64

while [ $# -gt 0 ]; do
    if [[ $1 == "--"* ]]; then
        v="${1/--/}"
        declare "$v"="$2"
        shift
    fi
    shift
done

ch-run --bind /home/divinpr/pbsprediction/protein_embeddings:/app/output biopython -- python /app/output/compute_protein_embeddings.py "--emb_name" "$emb_name" "--input_dataset" "$input_dataset" "--output_folder" "$output_folder"

if i change it to this (the origin file, just source path was changed )

#!/bin/bash
#SBATCH --partition=gpu-long        # partition you want to run job in
#SBATCH --gpus=3
#SBATCH --time=7-00:00:00         # walltime for the job in format (days-)hours:minutes:seconds
#SBATCH --mail-user=hamza.gamouh@gmail.com --mail-type=END,FAIL     # send email when job changes state to email address

export LD_LIBRARY_PATH=/usr/local/cuda/lib64

while [ $# -gt 0 ]; do
    if [[ $1 == "--"* ]]; then
        v="${1/--/}"
        declare "$v"="$2"
        shift
    fi
    shift
done

srun ch-run --bind /home/divinpr/pbsprediction/protein_embeddings:/app/output biopython -- python /app/output/compute_protein_embeddings.py "--emb_name" "$emb_name" "--input_dataset" "$input_dataset" "--output_folder" "$output_folder"

the error is still the same

Im have already spend a lot of time trying to make this work and I have to do another things. Creating of the image takes only a while, and I don`t know how to help you with this. So will you try to make your own image and try to run it on the cluster by yourself please?

STLYEKLGGTTAVDLAVDKFYERVLQDDRIKHFFADVDMAKQRAHQKAFLTYAFGGTDKYDGRYMREAHKELVENHGLNGEHFDAVAEDLLATLKEMGVPEDLIAEVAAVAGAPAHKRDVLNQ this is the sequence I use as a sample

hamzagamouh commented 1 year ago

@ProkopDivin I am sorry for the inconvenience, I will debug all the steps locally, and let you know.

hamzagamouh commented 1 year ago

@ProkopDivin I have the same error as yours. I don’t know why, it has some relation with the caching system that the image uses. You know, I think some things have changed in the cluster configuration since the time that I created this repo and this is why we ran into a lot of errors. Now I am also having problems with creating the images in my PhD work, some packages do not work as before. I will do a careful debugging and adapt it to the current version of the clusters.

ProkopDivin commented 1 year ago

hallo, I heard from Mr Hoksza that you find out how to make this run. Could you please tell me how or update the instruction? It would be really helpfull.

hamzagamouh commented 1 year ago

Hello, I still didn't solve the last "rdist" error. It may have some connection with the Python version. I will try to solve it today, and let you know of the new instructions.

hamzagamouh commented 1 year ago

Hey @ProkopDivin , I made a lot of changes to the repo, and I tested everything locally. It should be working now. Please try the new instructions and let me know. You can also use some sample data that I uploaded as well.

hamzagamouh commented 1 year ago

Hi @ProkopDivin , I am receiving emails of the status of your sbatch jobs. Please can you change the email of the .sh script to your email in order to receive them? Thank you

ProkopDivin commented 1 year ago

O sorry I forgot to change that.

ProkopDivin commented 1 year ago

hallo, @hamzagamouh Im just letting know that i try this and everything works. Thank you for your time.

hamzagamouh commented 1 year ago

You're welcome @ProkopDivin . Let me know anytime if you need any further help. All the best for your work