polaris-hub / polaris

Foster the development of impactful AI models in drug discovery.
https://polaris-hub.github.io/polaris/
Apache License 2.0
93 stars 6 forks source link

Add nucleic acid modalities #138

Open fteufel opened 3 months ago

fteufel commented 3 months ago

Modality right now only supports a very limited number of cases. It would be helpful to have biomolecules beyond proteins, starting with nucleic acids.

Describe the solution you'd like

Add dna and rna to polaris.dataset.Modality.

Would this be sufficient to get the functionality?

cwognum commented 3 months ago

Thanks @fteufel. This shouldn't be all that difficult to add, but since it doesn't affect the functionality of a dataset it's a low priority for now. With the ICML launch around the corner, there are some other, more pressing feature requests and issues to work on first.

For context: No functionality is currently conditioned on the modality field in the ColumnAnnotation. It's just metadata. There are ideas on our roadmap to add modality-specific functionality (e.g. visualizations). I expect that working on such modality-specific features will require us to change the type and structure of the modality field. For example, right now we we have Modality.MOLECULE, but we don't differentiate different molecular representations. Maybe we will also (or instead?) need to annotate that a column has SMILES or SELFIES or molecular graphs!

Of course, if you would really like to have more modalities supported in the meantime, please feel free to make a PR. It may take a while to make the necessary changes on the Hub's side as well, but I promise you we won't forget.