instadeepai / nucleotide-transformer

🧬 Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
https://www.biorxiv.org/content/10.1101/2023.01.11.523679v2
Other
480 stars 55 forks source link

pretraining the nucleotide-transformer #81

Open mpage21 opened 5 days ago

mpage21 commented 5 days ago

Hi, I would like to pre-train the model myself to gain a better understanding of machine learning models.

Specifically, could you provide the code that was used to pre-train the v2 500m multi-species model?

I am new to this so any and all help is appreciated.

thank you!

mpage21 commented 3 days ago

Okay, I think I might have found what I'm looking for. It looks like the code in the nucleotide_transformer directory is what was used to build the models (maybe I just need to tweak it a bit?) and then I think the configurations are found here: https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species/tree/main

Is this correct?

mpage21 commented 3 days ago

Actually, I'm not sure I see a training script. Is it named something else?