sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
463 stars 65 forks source link

[SeqVec2] Unable to open object (object 'char_embed' doesn't exist) #38

Closed ptynecki closed 4 years ago

ptynecki commented 4 years ago

Hey,

I'm trying to use SeqVecEmbedder with the newest SeqVec v2 weights and options files.

from bio_embeddings import SeqVecEmbedder

embedder = SeqVecEmbedder(
    weights_file='models/seqvec2/weights.hdf5',
    options_file='models/seqvec2/options.json'
)

After that I received the KeyError exception:

/lib/python3.6/site-packages/h5py/_hl/group.py in __getitem__(self, name)
    262                 raise ValueError("Invalid HDF5 object reference")
    263         else:
--> 264             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    265 
    266         otype = h5i.get_type(oid)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5o.pyx in h5py.h5o.open()

KeyError: "Unable to open object (object 'char_embed' doesn't exist)"

I discovered similar problem on the AlleNLP issues [1] and [2].

Thanks for your time for looking into this issue.

sacdallago commented 4 years ago

Hi @ptynecki , thanks for reporting the issue.

SecVec v2 is an experimental SeqVec version which I do not encourage you to use. I've kept it in the pipeline for the real SeqVec v2 which is still in planning phase ;) I would encourage you to rather use the traditional seqvec or the newer bert. Mind that there are issues being worked on on bert (#33 & #35 )

ptynecki commented 4 years ago

Got it.

Thank you for your clarification.

I've been using traditional SeqVec for months and I'm very exited and happy about the achieved results in my research (bacteriophages deeper understanding).

BERT, ALBERT and XLNet pre-trained models are the next in me agenda. I'm going to compare each of them in two classification issues in my domain.

mheinzinger commented 4 years ago

Short addendum to Chris' answer: for SeqVec "v2", we removed the CharCNN to see the impact on performance; so the error rather highlights that we need to refine our description of the model.