sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
463 stars 65 forks source link

I stop downloading the pre-train model at the first time to use SeqVecEmbedder #130

Closed 1511878618 closed 3 years ago

1511878618 commented 3 years ago

run the codes below: from bio_embeddings import embed embed.seqvec_embedder.SeqVecEmbedder()

then I met this message. OSError: Unable to open file (truncated file: eof = 90112, sblock->base_addr = 0, stored_eof = 374434776) I just don't wanna to rebuild a new Conda env

So can any one help me ?

1511878618 commented 3 years ago

and btw, when I run these codes, I stupidly stop the running code since the downloading went with 10kb/s .And after I changed my net to rerun it I met this problem.

konstin commented 3 years ago

It seems that you have a truncated (i.e. half-downloaded) weights file for seqvec. You can clean the corrupted weights by removing .cache/bio_embeddings/seqvec in your home directory. I'll check if we can catch these cases directly in bio_embeddings.

1511878618 commented 3 years ago

It seems that you have a truncated (i.e. half-downloaded) weights file for seqvec. You can clean the corrupted weights by removing .cache/bio_embeddings/seqvec in your home directory. I'll check if we can catch these cases directly in bio_embeddings.

emm, and my home directory don't have .cache/bio_embeddings/seqvec , actually no .cache directory. My system is macOS.

sacdallago commented 3 years ago

Hi @1511878618 , you won't see the .cache folder if you use the Finder. You have to use the terminal App to view these types of folders (anything that starts with a dot).

On you Mac, open spotlight (the 🔎 on the top-right). Search for "Terminal". At this point a new Terminal window will appear.

If you now type exactly:

ls -lat .cache/bio_embeddings/seqvec

you should see an output of files. Now you can type ​exactly the following:

rm -rf .cache/bio_embeddings/seqvec

to remove those files. After that, running:

ls -lat .cache/bio_embeddings/seqvec

will not list any files anymore. At this point, you can execute the code above again, aka:

from bio_embeddings import embed
embed.seqvec_embedder.SeqVecEmbedder()

P.s.: this time please don't interrupt the download. Alternatively, you can easily solve this problem by passing the model weights manually. This can be achieved in two steps:

  1. Download the model weights for the model you are interested in from here. In your case, for SeqVec.
  2. Initialize the model by passing the files:
    from bio_embeddings.embed import SeqVecEmbedder
    embedder = SeqVecEmbedder(weights_file="/path/to/file", options_file="path/to/file")

If you use models that come with zip files, then you have to first unzip the zip downloaded from the link above and pass the parameter model_directory=/path/to/unzipped/folder, e.g.:

from bio_embeddings.embed import ProtTransBertBFDEmbedder

embedder = ProtTransBertBFDEmbedder(model_directory="/path/to/unzipped/folder")
sacdallago commented 3 years ago

Feel free to re-open the issue if your problem is not solved!