Open ProkopDivin opened 1 year ago
Hello @ProkopDivin,
Thank you for your feedback. I can see a typo there in pbiopython
which should be biopython
.
In fact, it doesn't matter. The thing is that this is the name of the docker image, you can name it anyway you like, but you should maintain the same name in other scripts as well.
thank you for your response @hamzagamouh . I used everytime name "pbipython" insted of "biopython", so i thing it should not be the problem. Just to be sure a try it again and the result is the same . I runnig the commands in terminal, but I thing this should not be the problem.
there is what i get from the terminal:
[divinpr@dw01 protein_embeddings]$ ch-image build -t biopython .
1. FROM python:3.7.13-slim-buster
copying image ...
available --force: debderiv: Debian 9+, Ubuntu 14+, or other derivative
2. WORKDIR /app
4. COPY ['.'] -> '.'
6. RUN cd /app
8. RUN pip install -r requirements.txt
Collecting transformers==4.20.1
Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 25.9 MB/s eta 0:00:00
Collecting bio-embeddings[transformers]==0.2.2
Downloading bio_embeddings-0.2.2-py3-none-any.whl (105 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.4/105.4 KB 12.7 MB/s eta 0:00:00
Collecting biopython==1.79
Downloading biopython-1.79-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 33.4 MB/s eta 0:00:00
Collecting numpy==1.21.5
Downloading numpy-1.21.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 27.1 MB/s eta 0:00:00
Collecting pandas==1.3.5
Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 33.2 MB/s eta 0:00:00
Collecting requests
Downloading requests-2.28.2-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 KB 11.8 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.1.0
Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 190.3/190.3 KB 21.6 MB/s eta 0:00:00
Collecting importlib-metadata
Downloading importlib_metadata-6.0.0-py3-none-any.whl (21 kB)
Collecting tqdm>=4.27
Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 KB 12.9 MB/s eta 0:00:00
Collecting filelock
Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting pyyaml>=5.1
Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 596.3/596.3 KB 29.8 MB/s eta 0:00:00
Collecting packaging>=20.0
Downloading packaging-23.0-py3-none-any.whl (42 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.7/42.7 KB 8.3 MB/s eta 0:00:00
Collecting regex!=2019.12.17
Downloading regex-2022.10.31-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 757.1/757.1 KB 32.4 MB/s eta 0:00:00
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 33.6 MB/s eta 0:00:00
Collecting appdirs<2.0.0,>=1.4.4
Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting ruamel.yaml<0.18.0,>=0.17.10
Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.5/109.5 KB 17.3 MB/s eta 0:00:00
Collecting humanize<4.0.0,>=3.2.0
Downloading humanize-3.14.0-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.4/98.4 KB 16.9 MB/s eta 0:00:00
Collecting h5py<4.0.0,>=3.2.1
Downloading h5py-3.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 32.5 MB/s eta 0:00:00
Collecting plotly<6.0.0,>=5.1.0
Downloading plotly-5.13.0-py2.py3-none-any.whl (15.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.2/15.2 MB 30.3 MB/s eta 0:00:00
Collecting gensim<4.0.0,>=3.8.2
Downloading gensim-3.8.3-cp37-cp37m-manylinux1_x86_64.whl (24.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.2/24.2 MB 22.7 MB/s eta 0:00:00
Collecting importlib-metadata
Downloading importlib_metadata-4.13.0-py3-none-any.whl (23 kB)
Collecting scikit-learn<0.25.0,>=0.24.0
Downloading scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.3/22.3 MB 24.4 MB/s eta 0:00:00
Collecting scipy<2.0.0,>=1.4.1
Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.1/38.1 MB 18.9 MB/s eta 0:00:00
Collecting lock<2019.0.0,>=2018.3.25
Downloading lock-2018.3.25.2110.tar.gz (3.0 kB)
Preparing metadata (setup.py) ... done
Collecting umap-learn<0.6.0,>=0.5.1
Downloading umap-learn-0.5.3.tar.gz (88 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.2/88.2 KB 13.3 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting atomicwrites<2.0.0,>=1.4.0
Downloading atomicwrites-1.4.1.tar.gz (14 kB)
Preparing metadata (setup.py) ... done
Collecting torch<=1.10.0,>=1.8.0
Downloading torch-1.10.0-cp37-cp37m-manylinux1_x86_64.whl (881.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 881.9/881.9 MB 1.1 MB/s eta 0:00:00
Collecting python-slugify<6.0.0,>=5.0.2
Downloading python_slugify-5.0.2-py2.py3-none-any.whl (6.7 kB)
Collecting matplotlib<4.0.0,>=3.2.1
Downloading matplotlib-3.5.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.2/11.2 MB 11.6 MB/s eta 0:00:00
Collecting pytz>=2017.3
Downloading pytz-2022.7.1-py2.py3-none-any.whl (499 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 499.4/499.4 KB 997.6 kB/s eta 0:00:00
Collecting python-dateutil>=2.7.3
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 23.5 MB/s eta 0:00:00
Collecting six>=1.5.0
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting smart-open>=1.8.1
Downloading smart_open-6.3.0-py3-none-any.whl (56 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.8/56.8 KB 10.9 MB/s eta 0:00:00
Collecting typing-extensions>=3.7.4.3
Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting zipp>=0.5
Downloading zipp-3.12.0-py3-none-any.whl (6.6 kB)
Collecting pillow>=6.2.0
Downloading Pillow-9.4.0-cp37-cp37m-manylinux_2_28_x86_64.whl (3.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 34.8 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
Downloading fonttools-4.38.0-py3-none-any.whl (965 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 965.4/965.4 KB 34.0 MB/s eta 0:00:00
Collecting cycler>=0.10
Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting pyparsing>=2.2.1
Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 KB 16.6 MB/s eta 0:00:00
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.4.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 29.4 MB/s eta 0:00:00
Collecting tenacity>=6.2.0
Downloading tenacity-8.1.0-py3-none-any.whl (23 kB)
Collecting text-unidecode>=1.3
Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.2/78.2 KB 14.0 MB/s eta 0:00:00
Collecting ruamel.yaml.clib>=0.2.6
Downloading ruamel.yaml.clib-0.2.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (500 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 500.1/500.1 KB 29.5 MB/s eta 0:00:00
Collecting threadpoolctl>=2.0.0
Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting joblib>=0.11
Downloading joblib-1.2.0-py3-none-any.whl (297 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 298.0/298.0 KB 27.3 MB/s eta 0:00:00
Collecting numba>=0.49
Downloading numba-0.56.4-cp37-cp37m-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 34.4 MB/s eta 0:00:00
Collecting pynndescent>=0.5
Downloading pynndescent-0.5.8.tar.gz (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 34.5 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting certifi>=2017.4.17
Downloading certifi-2022.12.7-py3-none-any.whl (155 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 KB 20.8 MB/s eta 0:00:00
Collecting idna<4,>=2.5
Downloading idna-3.4-py3-none-any.whl (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 KB 11.4 MB/s eta 0:00:00
Collecting urllib3<1.27,>=1.21.1
Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.6/140.6 KB 20.1 MB/s eta 0:00:00
Collecting charset-normalizer<4,>=2
Downloading charset_normalizer-3.0.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (170 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 170.5/170.5 KB 19.8 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from numba>=0.49->umap-learn<0.6.0,>=0.5.1->bio-embeddings[transformers]==0.2.2->-r requirements.txt (line 2)) (57.5.0)
Collecting llvmlite<0.40,>=0.39.0dev0
Downloading llvmlite-0.39.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34.6/34.6 MB 20.0 MB/s eta 0:00:00
Building wheels for collected packages: atomicwrites, lock, umap-learn, pynndescent
Building wheel for atomicwrites (setup.py) ... done
Created wheel for atomicwrites: filename=atomicwrites-1.4.1-py2.py3-none-any.whl size=6957 sha256=b85bae8fc96b9899d1b5e74eec8e44a1c9b02f0ba188d6e8692d95ea23999f48
Stored in directory: /root/.cache/pip/wheels/0d/a9/a0/39edfadae620db443c05b5df0f6d5caad7411cf86b821790a6
Building wheel for lock (setup.py) ... done
Created wheel for lock: filename=lock-2018.3.25.2110-py3-none-any.whl size=3318 sha256=beff314c7c0d51243ca08378c4a2efb7162f3e47cc7f26c76ce30179ffe79de7
Stored in directory: /root/.cache/pip/wheels/80/70/80/f2d0dbe94130ae6eee956436fb20d3920649588cc39c892206
Building wheel for umap-learn (setup.py) ... done
Created wheel for umap-learn: filename=umap_learn-0.5.3-py3-none-any.whl size=82829 sha256=b3409bcf916bdc8d2abc9746f9692897cff82bd36a6f8c600e7ea654819820ad
Stored in directory: /root/.cache/pip/wheels/b3/52/a5/1fd9e3e76a7ab34f134c07469cd6f16e27ef3a37aeff1fe821
Building wheel for pynndescent (setup.py) ... done
Created wheel for pynndescent: filename=pynndescent-0.5.8-py3-none-any.whl size=55512 sha256=57e84574f6392f1074c6d8f3ae865cfe581f6d69413e2437f9adf3bc081ef2d8
Stored in directory: /root/.cache/pip/wheels/19/bc/eb/974072a56a7082a302f8b4be1ad6d21bf5019235c2eff65928
Successfully built atomicwrites lock umap-learn pynndescent
Installing collected packages: tokenizers, text-unidecode, pytz, lock, charset-normalizer, appdirs, zipp, urllib3, typing-extensions, tqdm, threadpoolctl, tenacity, smart-open, six, ruamel.yaml.clib, regex, pyyaml, python-slugify, pyparsing, pillow, packaging, numpy, llvmlite, joblib, idna, fonttools, filelock, cycler, certifi, atomicwrites, torch, scipy, ruamel.yaml, requests, python-dateutil, plotly, kiwisolver, importlib-metadata, h5py, biopython, scikit-learn, pandas, numba, matplotlib, humanize, huggingface-hub, gensim, transformers, pynndescent, umap-learn, bio-embeddings
Successfully installed appdirs-1.4.4 atomicwrites-1.4.1 bio-embeddings-0.2.2 biopython-1.79 certifi-2022.12.7 charset-normalizer-3.0.1 cycler-0.11.0 filelock-3.9.0 fonttools-4.38.0 gensim-3.8.3 h5py-3.8.0 huggingface-hub-0.12.0 humanize-3.14.0 idna-3.4 importlib-metadata-4.13.0 joblib-1.2.0 kiwisolver-1.4.4 llvmlite-0.39.1 lock-2018.3.25.2110 matplotlib-3.5.3 numba-0.56.4 numpy-1.21.5 packaging-23.0 pandas-1.3.5 pillow-9.4.0 plotly-5.13.0 pynndescent-0.5.8 pyparsing-3.0.9 python-dateutil-2.8.2 python-slugify-5.0.2 pytz-2022.7.1 pyyaml-6.0 regex-2022.10.31 requests-2.28.2 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 scikit-learn-0.24.2 scipy-1.7.3 six-1.16.0 smart-open-6.3.0 tenacity-8.1.0 text-unidecode-1.3 threadpoolctl-3.1.0 tokenizers-0.12.1 torch-1.10.0 tqdm-4.64.1 transformers-4.20.1 typing-extensions-4.4.0 umap-learn-0.5.3 urllib3-1.26.14 zipp-3.12.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 23.0 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
grown in 5 instructions: biopython
build slow? consider enabling the new build cache
hint: https://hpc.github.io/charliecloud/command-usage.html#build-cache
[divinpr@dw01 protein_embeddings]$ ls
compute_embeddings_cpu.sh compute_protein_embeddings.py README.md
compute_embeddings_gpu.sh Dockerfile requirements.txt
[divinpr@dw01 protein_embeddings]$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
[divinpr@dw01 protein_embeddings]$ ch-convert -i docker biopython .
input: docker biopython
output: ch-image .
error: source not found in Docker storage: biopython
[divinpr@dw01 protein_embeddings]$
@ProkopDivin Mmm I see, I will think about it and let you know in the afternoon. Because I cannot see that the image was created in docker.
@hamzagamouh is there some news, did you try it your self? In case you don`t , could you please? I want to know, if the problem happend only on my site.
Hello @ProkopDivin , I am sorry for late reply, I was very busy last week. Actually, I don’t know where is the problem, because I think that this is a problem on the cluster’s side. I think it would be better if we schedule a call, and walk through it together. I am available after 19:00 (Prague Time). What do you think?
Hi, @hamzagamouh , unfortunately I am not available after 19:00 (Prague Time) today or tomorrow. here is when i have time: (Prague Time) wednesday: until - 15:00 thurstday: until -12:00 friday : 16:00 - and after
@ProkopDivin I can do it on Friday. Or you can ask Jakub Yaghob (jakub.yaghob@matfyz.cuni.cz) for this issue. He is the manager of the cluster.
@hamzagamouh Since it look like the problem is in clusters site, I will ask Jakob and let you know, how it end up.
Hello mates,
I am afraid, you are mixing Docker and CharlieCloud images together. Using the command ch-image build
WITHOUT any parameter builds an internal CharlieCloud image in its host local cache. You can list all available CharlieCloud images using ch-build list
command. You were trying to list Docker images, but you weren't building Docker image, you have built CharlieCloud image.
Then use ch-convert
WITHOUT parameter -i docker
and it will be fine. Something like ch-convert pbiopython ./my_output_dir
.
@yaghobvtm, Thank you very much.
@hamzagamouh step 5 still not work
[divinpr@dw01 protein_embeddings]$ ch-convert biopython .
input: ch-image biopython
output: ch-image .
error: input and output formats must be different
because you have to specify the output directory and cant change change current directory to the directry with the image
@ProkopDivin
What is the output of ls
?
@ProkopDivin
Oh I see, I can run it myself and you need to specify the name of the directory that is gonna host the image.
If you try ch-convert biopython ./biopython
it's gonna work (it will take sometime). Please try this and let me know
yes, that work
Do you have some time for a short meet? Let's do the full debugging together until everything works nicely.
Im sorry I didn`t notice this message. I thing I spot some other mistakes and I know how to correct them.
first one:
sbatch --job-name job_name --output job_name.txt --emb_name bert --input_dataset datasets/dataset.csv --output_folder dest compute_embeddings_gpu.sh
in this you pass this arguments as a options
--emb_name bert --input_dataset datasets/dataset.csv --output_folder dest
for
sbatch
but this arguments are suposed to be for bash script so the order schould be:
sbatch --job-name job_name --output job_name.txt dest compute_embeddings_gpu.sh --emb_name bert --input_dataset datasets/dataset.csv --output_folder
and second in:
compute_embeddings_gpu.sh
you run the python script like this
srun ch-run --bind /home/gamouhh/files:/app/output biopython -- python /app/compute_protein_embeddings.py --emb_name $emb_name --input_dataset $input_dataset --output_folder $output_folder
but srun is for an interactive job so i think it dont have to be there and it is ok like this:
ch-run --bind /home/gamouhh/files:/app/output biopython -- python /app/compute_protein_embeddings.py --emb_name $emb_name --input_dataset $input_dataset --output_folder $output_folder
now when I try to run it I have this error:
[divinpr@gpulab protein_embeddings]$ squeue -u divinpr
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[divinpr@gpulab protein_embeddings]$ sbatch --job-name job --output out.txt compute_embeddings_gpu.sh --emb_name bert --input_dataset a.001.001.001_1s69a_A.fa --output_folder ~/pbsprediction/protein_embeddings/embeddings
Submitted batch job 123789
[divinpr@gpulab protein_embeddings]$ squeue -u divinpr
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123789 gpu-long job divinpr R 0:02 2 volta[01-02]
[divinpr@gpulab protein_embeddings]$squeue -u divinpr
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[divinpr@gpulab protein_embeddings]$ cat out.txt
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-xt3ke_l4 because the default path (/home/divinpr/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Import embedder...
Traceback (most recent call last):
File "/app/output/compute_protein_embeddings.py", line 68, in <module>
EMBEDDER=get_embedder(emb_name)
File "/app/output/compute_protein_embeddings.py", line 42, in get_embedder
from bio_embeddings.embed.prottrans_bert_bfd_embedder import ProtTransBertBFDEmbedder
File "/usr/local/lib/python3.7/site-packages/bio_embeddings/__init__.py", line 14, in <module>
import bio_embeddings.project
File "/usr/local/lib/python3.7/site-packages/bio_embeddings/project/__init__.py", line 5, in <module>
from bio_embeddings.project.umap import umap_reduce
File "/usr/local/lib/python3.7/site-packages/bio_embeddings/project/umap.py", line 1, in <module>
from umap import UMAP
File "/usr/local/lib/python3.7/site-packages/umap/__init__.py", line 2, in <module>
from .umap_ import UMAP
File "/usr/local/lib/python3.7/site-packages/umap/umap_.py", line 41, in <module>
from umap.layouts import (
File "/usr/local/lib/python3.7/site-packages/umap/layouts.py", line 37, in <module>
"dim": numba.types.intp,
File "/usr/local/lib/python3.7/site-packages/numba/core/decorators.py", line 212, in wrapper
disp.enable_caching()
File "/usr/local/lib/python3.7/site-packages/numba/core/dispatcher.py", line 863, in enable_caching
self._cache = FunctionCache(self.py_func)
File "/usr/local/lib/python3.7/site-packages/numba/core/caching.py", line 601, in __init__
self._impl = self._impl_class(py_func)
File "/usr/local/lib/python3.7/site-packages/numba/core/caching.py", line 338, in __init__
"for file %r" % (qualname, source_path))
RuntimeError: cannot cache function 'rdist': no locator available for file '/usr/local/lib/python3.7/site-packages/umap/layouts.py'
the contend of compute_embeddings_gpu.sh is
#!/bin/bash
#SBATCH --partition=gpu-long # partition you want to run job in
#SBATCH --gpus=3
#SBATCH --time=7-00:00:00 # walltime for the job in format (days-)hours:minutes:seconds
#SBATCH --mail-user=hamza.gamouh@gmail.com --mail-type=END,FAIL # send email when job changes state to email address
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
while [ $# -gt 0 ]; do
if [[ $1 == "--"* ]]; then
v="${1/--/}"
declare "$v"="$2"
shift
fi
shift
done
ch-run --bind /home/divinpr/pbsprediction/protein_embeddings:/app/output biopython -- python /app/output/compute_protein_embeddings.py "--emb_name" "$emb_name" "--input_dataset" "$input_dataset" "--output_folder" "$output_folder"
if i change it to this (the origin file, just source path was changed )
#!/bin/bash
#SBATCH --partition=gpu-long # partition you want to run job in
#SBATCH --gpus=3
#SBATCH --time=7-00:00:00 # walltime for the job in format (days-)hours:minutes:seconds
#SBATCH --mail-user=hamza.gamouh@gmail.com --mail-type=END,FAIL # send email when job changes state to email address
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
while [ $# -gt 0 ]; do
if [[ $1 == "--"* ]]; then
v="${1/--/}"
declare "$v"="$2"
shift
fi
shift
done
srun ch-run --bind /home/divinpr/pbsprediction/protein_embeddings:/app/output biopython -- python /app/output/compute_protein_embeddings.py "--emb_name" "$emb_name" "--input_dataset" "$input_dataset" "--output_folder" "$output_folder"
the error is still the same
Im have already spend a lot of time trying to make this work and I have to do another things. Creating of the image takes only a while, and I don`t know how to help you with this. So will you try to make your own image and try to run it on the cluster by yourself please?
STLYEKLGGTTAVDLAVDKFYERVLQDDRIKHFFADVDMAKQRAHQKAFLTYAFGGTDKYDGRYMREAHKELVENHGLNGEHFDAVAEDLLATLKEMGVPEDLIAEVAAVAGAPAHKRDVLNQ this is the sequence I use as a sample
@ProkopDivin I am sorry for the inconvenience, I will debug all the steps locally, and let you know.
@ProkopDivin I have the same error as yours. I don’t know why, it has some relation with the caching system that the image uses. You know, I think some things have changed in the cluster configuration since the time that I created this repo and this is why we ran into a lot of errors. Now I am also having problems with creating the images in my PhD work, some packages do not work as before. I will do a careful debugging and adapt it to the current version of the clusters.
hallo, I heard from Mr Hoksza that you find out how to make this run. Could you please tell me how or update the instruction? It would be really helpfull.
Hello, I still didn't solve the last "rdist" error. It may have some connection with the Python version. I will try to solve it today, and let you know of the new instructions.
Hey @ProkopDivin , I made a lot of changes to the repo, and I tested everything locally. It should be working now. Please try the new instructions and let me know. You can also use some sample data that I uploaded as well.
Hi @ProkopDivin , I am receiving emails of the status of your sbatch jobs. Please can you change the email of the .sh script to your email in order to receive them? Thank you
O sorry I forgot to change that.
hallo, @hamzagamouh Im just letting know that i try this and everything works. Thank you for your time.
You're welcome @ProkopDivin . Let me know anytime if you need any further help. All the best for your work
after running (only difference from guid is that I named image "pbiopython" insted of "biopython")
git clone https://github.com/hamzagamouh/protein_embeddings.git
cd protein_embeddings
to go to the repo directory (where a dockerFile is stored)salloc -C docker
to switch to a node where docker is installed.ch-image build -t pbiopython .
to create a docker image (for example here the name of the image will be "biopython").everything seems to be alright. Many packages are downloaded and installed. There are only some warnings about possible pip update and recomended venv.
The problem is that i can not find the image named "pbiopython". I supose that it schoud be in the "protein_embeddings" directory. I also run comand
sudo docker images
and there isn`t any image.So after running:
ch-convert -i docker pbiopython .
to convert the docker image to a directory structure.i get: