sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
463 stars 65 forks source link

Where is the docker image? #112

Closed deniseduma closed 3 years ago

deniseduma commented 3 years ago

Hi,

I'm trying to use docker to run bio-embeddings and you say "We provide a docker image at rostlab/bio_embeddings"

Where is the docker image because I can't find it?

Also, in the command

docker run --rm --gpus all \ -v "$(pwd)/examples/docker":/mnt \ -u $(id -u ${USER}):$(id -g ${USER}) \ rostlab/bio_embeddings /mnt/config.yml

there is $pwd, where should I run the docker command from? Can you please provide a bit more details about how to run this?

Thanks, Denise

sacdallago commented 3 years ago

HI @deniseduma ,

we need to update the documentation slightly (@konstin). You can build the image from the Docker file you find in the root. This is usually something like docker built -t . when you run the commant from the root of the repo. @konstin you might to expand on this one.

Alternatively, we started providing auto-built docker images, but we saw that the usage was rather low, so we now limited production to web-server docker images: https://github.com/orgs/bioembeddings/packages

We could add a general "bio-embeddings" image there so that it's easier for users like you to download (or maybe simply hijack the worker to do that?).

Anyway, let me have a quick discussion with @konstin and get back to you with easier instructions @deniseduma :)

deniseduma commented 3 years ago

Hi Christian,

Thank you very much for getting back to me!

I'm not familiar with Docker actually and I'd much rather prefer to use the pip installation which unfortunately doesn't work for me either!

I went back to trying to install bio-embeddings on the cluster I'm using (I'm in Munich as well, I work at Helmholtz Zentrum ) but I'm having some annoying dependencies problems, as in bio-embeddings downgrades the installation of PyTorch from 1.7.1 to 1.5.1 and it then complains it's too old for torchvision and also tries to upgrade NumPy system-wise which obviously fails. So I'm currently stuck...

deniseduma commented 3 years ago

Ok, I've managed to install the package with pip finally!

I won't need Docker after all but thanks for getting back to me, really appreciated!

Denise

deniseduma commented 3 years ago

Me again!

Sorry about this, but now I have another issue, I managed to install the package on a slum cluster and submitted the job to one of the GPU machines in the cluster but now I get the following error:

Traceback (most recent call last): File "embed_seqs.py", line 8, in embedder = ProtTransBertBFDEmbedder() File "/home/dd/.conda/envs/bio_embed/lib/python3.8/site-packages/bio_embeddings/embed/prottrans_bert_bfd_embedder.py", line 29, in init super().init(*kwargs) File "/home/dd/.conda/envs/bio_embed/lib/python3.8/site-packages/bio_embeddings/embed/embedder_interfaces.py", line 59, in init self._options[directory] = get_model_directories_from_zip( File "/home/dd/.conda/envs/bio_embed/lib/python3.8/site-packages/bio_embeddings/utilities/remote_file_retriever.py", line 74, in get_model_directories_from_zip request.urlretrieve(url, filename=file_name, reporthook=t.update_to) File "/home/dd/.conda/envs/bio_embed/lib/python3.8/urllib/request.py", line 247, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "/home/dd/.conda/envs/bio_embed/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/dd/.conda/envs/bio_embed/lib/python3.8/urllib/request.py", line 525, in open response = self._open(req, data) File "/home/dd/.conda/envs/bio_embed/lib/python3.8/urllib/request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/home/dd/.conda/envs/bio_embed/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(args) File "/home/dd/.conda/envs/bio_embed/lib/python3.8/urllib/request.py", line 1379, in http_open return self.do_open(http.client.HTTPConnection, req) File "/home/dd/.conda/envs/bio_embed/lib/python3.8/urllib/request.py", line 1353, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

Any idea how to fix this?

Thanks, Denise

konstin commented 3 years ago

Sorry for all the trouble with the missing docker image, I've totally missed this when making the 0.1.5 release.

I've now published ghcr.io/bioembeddings/bio_embeddings:v0.1.6 which can be used like this:

docker run --rm --gpus all \
    -v "$(pwd)/examples/docker":/mnt \
    -v bio_embeddings_weights_cache:/root/.cache/bio_embeddings \
    -u $(id -u ${USER}):$(id -g ${USER}) \
    ghcr.io/bioembeddings/bio_embeddings:v0.1.6 /mnt/config.yml

urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

This sounds like the machine has no internet, thought I'm not sure we they'd block cluster nodes from making network requests.

deniseduma commented 3 years ago

Thanks for updating the docker image although I guess, I'll happily pass on using it! :p I'm not a big fan of Docker I guess! :p

Regarding, [Errno 101] Network is unreachable, the cluster admins never got back to me, but yes, it seems the cluster nodes don't have Internet access which is weird! Luckily the login nodes do, so I used those to download the weights!

Thanks, Denise