openforcefield / protein-ligand-benchmark

Protein-Ligand Benchmark Dataset for Free Energy Calculations
MIT License
144 stars 15 forks source link

Adding new directory structure for edges #83

Closed IAlibay closed 1 year ago

IAlibay commented 1 year ago

Fixes #69

ijpulidos commented 1 year ago

While trying to run all the edges in the data set I detected the current issues in some of the targets

Target pfkfb3 references non-existing ligand in edge edge_lig_38_lig_36.
Target pfkfb3 references non-existing ligand in edge edge_lig_54_lig_38.
Target pfkfb3 references non-existing ligand in edge edge_lig_46_lig_24.
Target pfkfb3 references non-existing ligand in edge edge_lig_59_lig_30.
Target pfkfb3 references non-existing ligand in edge edge_lig_53_lig_38.
Target pfkfb3 references non-existing ligand in edge edge_lig_67_lig_24.
Target thrombin references non-existing ligand in edge edge_lig_7a_lig_5.
Target cdk8 references non-existing ligand in edge edge_lig_14_lig_23.
Target cdk8 references non-existing ligand in edge edge_lig_17_lig_23.
Target cdk8 references non-existing ligand in edge edge_lig_42_lig_23.
Target cdk8 references non-existing ligand in edge edge_lig_45_lig_23.
Target cdk8 references non-existing ligand in edge edge_lig_43_lig_23.
Target cdk8 references non-existing ligand in edge edge_lig_44_lig_23.
Target cdk8 references non-existing ligand in edge edge_lig_18_lig_23.
Target p38 references non-existing ligand in edge edge_lig_p38a_3fmk_lig_p38a_2m.
Target tyk2 references non-existing ligand in edge edge_lig_jmc_30_lig_ejm_48.
Target shp2 references non-existing ligand in edge edge_lig_E14_lig_E26.

It's not specially easy from my script to check which ligands exactly are missing or being referenced in the edges but not in the ligands.sdf. But by just inspection I could check that for tyk2 the ligand lig_jmc_30 doesn't exist in https://github.com/openforcefield/protein-ligand-benchmark/blob/issues_75_and_81/data/tyk2/02_ligands/ligands.sdf (this is the revision for the latest changes with fixes).

Hence it does seem that there are some edges referencing ligands that do not exist in the latest revision in https://github.com/openforcefield/protein-ligand-benchmark/pull/82 . Now, I see some ligands.sdf were changed in that PR, so it might be that the network of transformations just needs to be updated with the changes on that branch/revision.

IAlibay commented 1 year ago

@ijpulidos - Am I correct in understanding you're using the new graph edges from this PR but applying it to the branch from #82?

If that's the case then it's partly expected, once #82 is merged then I'll re-generate the edges here to match that (I didn't want to make this depend on the other branch if we're going to squash merge things).

ijpulidos commented 1 year ago

@ijpulidos - Am I correct in understanding you're using the new graph edges from this PR but applying it to the branch from #82?

Yes, I merged both branches because I needed the changes in both for my test to make sense.

If that's the case then it's partly expected, once #82 is merged then I'll re-generate the edges here to match that (I didn't want to make this depend on the other branch if we're going to squash merge things).

Alright! That sounds good to me. My test will still check things run on perses, so this is not a blocking issue. I just wanted to mention it in case we weren't expecting it. All good!

ijpulidos commented 1 year ago

Actually, on a second thought, I realize that we do need the branches to be merged for me to be able to test this correctly, since the data for the edges and the ligands is correlated.

codecov[bot] commented 1 year ago

Codecov Report

Merging #83 (23ca3e8) into main (e9b153e) will not change coverage. The diff coverage is n/a.

:exclamation: Current head 23ca3e8 differs from pull request most recent head d2f51ac. Consider uploading reports for the commit d2f51ac to get more accurate results

Additional details and impacted files
dotsdl commented 1 year ago

Ready for review!

dotsdl commented 1 year ago

@IAlibay please merge when satisfied! Thanks!

dotsdl commented 1 year ago

From @IAlibay : not getting minimum-spanning graphs from perses, but starmaps instead. Can't use latest perses since it doesn't have openff-toolkit 0.11.0 support.

IAlibay commented 1 year ago

From @IAlibay : not getting minimum-spanning graphs from perses, but starmaps instead. Can't use latest perses since it doesn't have openff-toolkit 0.11.0 support.

On my side of things I also need to work out what was happening with the 0.4 version of openfe (which works with off-tk 0.10.x), e.g. if it's an issue about how we're ingesting perses score.

dotsdl commented 1 year ago

Blocked by: https://github.com/choderalab/perses/pull/1128

ijpulidos commented 1 year ago

Just FYI. From today's meeting I would be trying to merge the directory structure from this PR to the main branch, such that we can continue working on the transformation networks with the correct directory structure.

ijpulidos commented 1 year ago

The idea is that we would be dropping the current edges yaml file in this PR and just leave the directories for @IAlibay and other to build on top of it from the main branch.

ijpulidos commented 1 year ago

Actually, I think I prefer to have some edges files merged, even if they are really bad ones. Because that way we can still run the simulations against this branch or the main one with he same scripts, and it would also give us placeholders that we can use for future PRs, whereas if we leave the directories empty they won't even appear in the repo. I am going to revert the latest commit for now, such that we can still use this branch for some benchmarks if needed. @IAlibay thoughts?

IAlibay commented 1 year ago

Actually, I think I prefer to have some edges files merged, even if they are really bad ones. Because that way we can still run the simulations against this branch or the main one with he same scripts, and it would also give us placeholders that we can use for future PRs, whereas if we leave the directories empty they won't even appear in the repo. I am going to revert the latest commit for now, such that we can still use this branch for some benchmarks if needed. @IAlibay thoughts?

Yeah sure, I think that makes sense.