openkinome / kinoml

Structure-informed machine learning for kinase modeling
https://openkinome.org/kinoml/
MIT License
51 stars 21 forks source link

Datasets API (building on brainstormed dreamed example) #8

Closed jaimergp closed 3 years ago

jaimergp commented 4 years ago

Description

This PR will establish the needed Dataset objects (and supporting abstractions) as brainstormed in the included examples/api-brainstorming/kinoml_example.py (see diff). This was only conceived as a draft, so we'll need to iterate on it and provide the actual implementations to match that as close as possible.

Datasets to include

Implementation progress

Questions

Implement modular log-likelihoods coming from different data sources.

Comes from #13

Key aspects:

Todos

Structural featurizers

I have merged all PRs here and will work on getting them in shape for the library.

To do

codecov-io commented 4 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@58b7590). Click here to learn what that means. The diff coverage is n/a.

jaimergp commented 4 years ago

Wooohoo it's green!

jaimergp commented 4 years ago

Well, green no more... :)

@t-kimber, one of the reasons we have failing tests is that.. we are now testing the featurizers :) However, our test examples no longer apply because once we enter the OpenForceField territory, the SMILES get canonicalized so what initially was C internally becomes [H][C]([H])([H])[H].

So, of course, Morgan fingerprints are different due to the oxygen atoms, OHE matrices deal with more stuff, etc. Can we adapt the pytest.mark.parametrize data points in kinoml/tests/features/test_ligand.py so it provides the right solutions?

You can the OFFTK SMILES representation with kinoml.core.ligands.Ligand.from_smiles("C").to_smiles() (change "C" to whatever you need).

jaimergp commented 4 years ago

Where can we find the Dream dataset, @AndreaVolkamer?

jaimergp commented 4 years ago

Tests run again now, and fail as expected :)

jaimergp commented 3 years ago

This is in good shape now, or at least good enough to merge. There's plenty of things to fix and improve, but from now on we'll address those in issues + smaller PRs.