How to load pre-trained model for fine-tuning?

kuleshov-group / caduceus

Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Apache License 2.0

137 stars 14 forks source link

How to load pre-trained model for fine-tuning? #14

Closed zhan8855 closed 2 months ago

zhan8855 commented 3 months ago

Hi, thank you for your awesome work!

However, I am facing difficulties to load huggingface checkpoints for further fine-tuning using the slurm scripts. It seems that the huggingface configs cannot align with the slurm model configs. I am trying to fix this by adding "target": "caduceus.configuration_caduceus.CaduceusConfig" to the huggingface configs. Is it okay?

What's more, I still have no idea on how to load the model. Could you please offer me some suggestions? Or could you please release the pre-trained checkpoints produced by slurm scripts? Thank you very much in advance!

zhan8855 commented 3 months ago

The tilt target above should be " target "

leannmlindsey commented 2 months ago

I second releasing the pre-trained checkpoints. It is not too much work to pre-train, but you do have to have 8 A100 and or 8 A6000 GPUs which is sometimes hard to access.

Also, I felt like I couldn't use my checkpoints on my downstream task until I had reproduced all of your results to verify that I had the correct model. It is a lot of extra work for researchers who just want to make use of this wonderful resource.

yair-schiff commented 2 months ago

However, I am facing difficulties to load huggingface checkpoints for further fine-tuning using the slurm scripts. It seems that the huggingface configs cannot align with the slurm model configs. I am trying to fix this by adding "target": "caduceus.configuration_caduceus.CaduceusConfig" to the huggingface configs. Is it okay?

Yes currently the code is not compatible with HF models, but I created a branch here: https://github.com/kuleshov-group/caduceus/tree/hf_finetune where you can now load HF Caducueus models for downstream tasks, e.g., Genmoics Benchmark and Nucleotide Transformer. See these wrapper scripts for how to use them:

Please let me know if you still have questions

yair-schiff commented 2 months ago

I second releasing the pre-trained checkpoints. It is not too much work to pre-train, but you do have to have 8 A100 and or 8 A6000 GPUs which is sometimes hard to access.

Also, I felt like I couldn't use my checkpoints on my downstream task until I had reproduced all of your results to verify that I had the correct model. It is a lot of extra work for researchers who just want to make use of this wonderful resource.

I pushed the models for the Genomics Benchmark and Nucleotide Transformer to the HF Hub. You can use them as described in the comment above Genomics Benchmark

Nucleotide Transformer

zhan8855 commented 2 months ago

Thank you so much!!

leannmlindsey commented 2 months ago

Thank you Yair!