zjunlp / OntoProtein

[ICLR 2022] OntoProtein: Protein Pretraining With Gene Ontology Embedding
MIT License
138 stars 22 forks source link

What is the `OntoModel` referenced in `run_pretrain.sh` #23

Closed tomcobley closed 1 year ago

tomcobley commented 1 year ago

Hi!

I am trying to pretrain the model using the datasets and pretrained models mentioned in the README and the run_pretrain.sh script.

However, I have run into a few problems which seem to be due to my choice of TEXT_MODEL_PATH in run_pretrain.sh on this line.

What should this model be?

I assumed this meant the PubMedBERT model mentioned in the README, but execution fails if I use this - am I missing something?

Many thanks in advance!

Alexzhuan commented 1 year ago

Hi,

Sorry, the TEXT_MODEL_PATH seems a redundant parameter in this script. The params of OntoModel are randomly initialized in the pretraining by default when GO_ENCODER_CLS is set to embedding, and you could modify this path to load the param of PubMedBERT if you set the GO_ENCODER_CLS to bert.