zhengkangjie / ESM-AA

MIT License
58 stars 6 forks source link

Sample finetuning/embedding code for molecular tasks #4

Open ddhostallero opened 2 months ago

ddhostallero commented 2 months ago

The scripts/extract.py code seems to work fine for protein-level embeddings. However, would you be able to provide an equivalent for molecule-level? Additionally, I would like to request a sample code for finetuning for molecular tasks. Thanks

zhengkangjie commented 2 months ago

To inference on molecular tasks, it is necessary to install the Uni-Mol library and utilize its datasets for inference. Specifically, you need to replace the built-in dictionary of Uni-Mol with the alphabet from the current model. Additionally, set the aa_mask to all zeros during the forward pass. Other input parameters, such as src_distance and src_edge_type, should remain consistent with Uni-Mol's default settings.

ddhostallero commented 2 months ago

By replacing the built-in dictionary of Uni-Mol, do you mean adding _a to the atoms list (i.e. from C to C_a, N to N_a, etc.) or is there something I am missing?

DragonDescentZerotsu commented 1 month ago

Any planned timeline for releasing molecule-level code?

zhengkangjie commented 4 weeks ago

Apologies for the delay; we will organize and release this part of the code as soon as possible in the coming weeks.