Closed jaimergp closed 3 years ago
:exclamation: No coverage uploaded for pull request base (
master@58b7590
). Click here to learn what that means. The diff coverage isn/a
.
Wooohoo it's green!
Well, green no more... :)
@t-kimber, one of the reasons we have failing tests is that.. we are now testing the featurizers :) However, our test examples no longer apply because once we enter the OpenForceField territory, the SMILES get canonicalized so what initially was C
internally becomes [H][C]([H])([H])[H]
.
So, of course, Morgan fingerprints are different due to the oxygen atoms, OHE matrices deal with more stuff, etc. Can we adapt the pytest.mark.parametrize
data points in kinoml/tests/features/test_ligand.py
so it provides the right solutions?
You can the OFFTK SMILES representation with kinoml.core.ligands.Ligand.from_smiles("C").to_smiles()
(change "C"
to whatever you need).
Where can we find the Dream dataset, @AndreaVolkamer?
Tests run again now, and fail as expected :)
This is in good shape now, or at least good enough to merge. There's plenty of things to fix and improve, but from now on we'll address those in issues + smaller PRs.
Description
This PR will establish the needed
Dataset
objects (and supporting abstractions) as brainstormed in the includedexamples/api-brainstorming/kinoml_example.py
(see diff). This was only conceived as a draft, so we'll need to iterate on it and provide the actual implementations to match that as close as possible.Datasets to include
Implementation progress
Questions
Filter
objects? We could implementenumpy
-like slicing mechanisms to saydataset_provider[dataset_provider['protein']['n_residues'] < 100]
, although I don't know how complex this would be. Maybe in a future release (we would need to inherit frompandas
or provide views to our wrapper dataframes within the dataset).Implement modular log-likelihoods coming from different data sources.
Comes from #13
Key aspects:
dG/KT
, but we can discretize in inactives / actives later if needed.[substrate]
: we could estimate by cross-validation, use relative pIC50s, or add that as a nuisance parameter in the futureTodos
DatasetCollection
orMetaDataset
objectStructural featurizers
I have merged all PRs here and will work on getting them in shape for the library.
To do