westlake-repl / SaProt

[ICLR'24 spotlight] Saprot: Protein Language Model with Structural Alphabet
MIT License
271 stars 25 forks source link

How to reproduce the SaProt_35M_AF2? #34

Closed AaranWang closed 1 month ago

AaranWang commented 1 month ago

As a newcomer in this area, I‘m aiming to train the simplest DL model following the training process of SaProt_35M_AF2. I only have access to the source data (data.mdb) of AF2_Uniref50. Can you provide the raw data of SaProt_35M_AF2? Thx

LTEnjoy commented 1 month ago

Hi,

If you want to reproduce the training process of SaProt_35M_AF2, you just need to use the LMDB file of AF2_UniRef50. Can you explain more about the raw data?

AaranWang commented 1 month ago

I originally thought that the raw data of AF2_Uniref50 was used to train SaProt_650M_AF2. Isn't that the case?

LTEnjoy commented 1 month ago

We used the LMDB file of AF2_UniRef50 to train both SaProt_650M_AF2 and SaProt_35M_AF2.

AaranWang commented 1 month ago

How can I determine whether to train the SaProt_650M_AF2 or SaProt_35M_AF2 model? Additionally, is the cost of training the SaProt_35M_AF2 relatively lower compared to training the SaProt_650M_AF2? Thank you for your kind reply.

LTEnjoy commented 1 month ago

The model is loaded based on its config path. If you want to train SaProt_35M_AF2, you only have to switch the config path to point to SaProt_35M_AF2. Training SaProt_35M_AF2 is much easier than the 650M, but it still took two weeks on 8 A100 GPUs.

AaranWang commented 1 month ago

Is this SaProt/config/pretrain/saprot.yaml file i should modity? And change the SaProt_650M_AF2 item to SaProt_35M_AF2? Is there other options should i modify or can you provice a modified yaml file? Thank you very much.

LTEnjoy commented 1 month ago

Yes. You have to change the 650M to 35M and keep other setting default. Please make sure the lmdb path(i.e. train_lmdb etc.) is correct on your server.

AaranWang commented 1 month ago

Thank you. Best wishes to u. (^.^)

LTEnjoy commented 1 month ago

You are welcome! 😃😃