nadavbra / protein_bert

479 stars 98 forks source link

Improving accuracy while fine-tuning proteinBERT #72

Closed Chinjuj2017 closed 8 months ago

Chinjuj2017 commented 9 months ago

Hi, can anyone suggest ways to improve the accuracy of proteinBERT while fine-tuning? Similar to the adapter methods in PyTorch-based models do we have any concept in TensorFlow LLMs?

ddofer commented 9 months ago

Hi, You can look at the example notebook (used in finetuning on the benchmarks). https://github.com/nadavbra/protein_bert/blob/master/ProteinBERT%20demo.ipynb

Optimal accuracy (and higher complexity) is finetuning the entire model on a task, with a task specfic head (e.g. classification or regression), or freezing most layers and just fine-tuning/training the extra layer.

A much easier approach is to get embeddings, then train a seperate model on those embeddings, e.g. on the global layer embeddings + an XGboost or scikit-learn random forest model trained on that.