mims-harvard / TDC

Therapeutics Commons (TDC-2): Multimodal Foundation for Therapeutic Science
https://tdcommons.ai
MIT License
980 stars 170 forks source link

(new feature) large molecule multi-objective fluorescent protein oracle #150

Open samuelstanton opened 2 years ago

samuelstanton commented 2 years ago

Describe the problem It would be great to add the large molecule fluorescent protein task introduced here as a molecular generation oracle. Code is available here.

Describe the solution you'd like

In brief the oracle would return the folding stability (-dG) and solvent accessible surface area (SASA), given a primary amino acid residue sequence and a reference PDB. The oracle would first translate the input sequence to a list of substitution mutations, then use FoldX to generate a predicted structure and the predicted folding stability. SASA can be computed from the predicted structure using BioPython.

Additional context I would potentially be willing to make a PR if this is something the maintainers would approve.

kexinhuang12345 commented 2 years ago

Hi Samuel, this sounds like a great oracle to include in TDC! We usually implement individual oracle function at here while the tdc.Oracle interface is the wrapper around all oracles. Let us know if you need further pointers and happy to discuss more!

kexinhuang12345 commented 2 years ago

Hi @samuelstanton any thought on this? no rush!

samuelstanton commented 2 years ago

Hi Kevin, haven't had time to draft a PR bc of some impending deadlines. I'm hoping to defend my dissertation in July so realistically it'll probably be a while before I'll be able to give this proper attention. If you like we can close the issue for now, or leave it open until I have more time, up to you.

futianfan commented 2 years ago

good luck with your defense and no rush!