uw-ipd / tmol

TMol
Apache License 2.0
30 stars 3 forks source link

Residue centric cartbonded #260

Closed jflat06 closed 11 months ago

jflat06 commented 1 year ago

This PR reworks cartbonded to use the new res-centric paradigm.

The overall flow of the calculation is as follows:

To accomplish the above, several systems were added:

Lastly, I changed the database format. I have changed the old database to have an "old" suffix. It should be trivial to remove when we remove non-res-centric code.

jflat06 commented 1 year ago

A design question for either of you -

Right now I kept the atom identifiers in the yaml file the same - they just use the generic atom names. This means that I have to do the work to make the unique (res-specific) atom IDs and the wildcard IDs. It also means that I have to deduce when a param crosses the residue boundary. (C-N)

One could imagine an alternative where we have a way of specifying the unique atom IDs directly (or wildcard ID, if you wanted it to remain a wildcard). This would be a bit more explicit, and you could essentially load the params directly from the DB into the hash table. I'm not sure whether this is actually better or worth the effort to re-write it, though.

jflat06 commented 1 year ago

@aleaverfay this is passing all tests now, if you want to take a look.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.10% :tada:

Comparison is base (056044f) 95.03% compared to head (7375f70) 95.13%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #260 +/- ## ========================================== + Coverage 95.03% 95.13% +0.10% ========================================== Files 335 340 +5 Lines 21107 21546 +439 ========================================== + Hits 20058 20497 +439 Misses 1049 1049 ``` | [Flag](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | Coverage Δ | | |---|---|---| | [_shrug_Testing_CPU](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `89.99% <99.55%> (+0.19%)` | :arrow_up: | | [_shrug_Testing_CPU_w_o_jit](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `91.82% <99.55%> (+0.16%)` | :arrow_up: | | [_shrug_Testing_CUDA](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `92.47% <100.00%> (+0.15%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files Changed](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | Coverage Δ | | |---|---|---| | [tmol/tests/score/cartbonded/test\_script\_modules.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC90ZXN0cy9zY29yZS9jYXJ0Ym9uZGVkL3Rlc3Rfc2NyaXB0X21vZHVsZXMucHk=) | `100.00% <ø> (ø)` | | | [tmol/chemical/constants.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9jaGVtaWNhbC9jb25zdGFudHMucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/chemical/restypes.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9jaGVtaWNhbC9yZXN0eXBlcy5weQ==) | `94.05% <100.00%> (+0.83%)` | :arrow_up: | | [tmol/database/scoring/\_\_init\_\_.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9kYXRhYmFzZS9zY29yaW5nL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [tmol/database/scoring/cartbonded.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9kYXRhYmFzZS9zY29yaW5nL2NhcnRib25kZWQucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/pose/packed\_block\_types.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9wb3NlL3BhY2tlZF9ibG9ja190eXBlcy5weQ==) | `99.10% <100.00%> (+0.10%)` | :arrow_up: | | [tmol/score/atom\_type\_dependent\_term.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9hdG9tX3R5cGVfZGVwZW5kZW50X3Rlcm0ucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/score/cartbonded/cartbonded\_energy\_term.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9jYXJ0Ym9uZGVkL2NhcnRib25kZWRfZW5lcmd5X3Rlcm0ucHk=) | `100.00% <100.00%> (ø)` | | | [...l/score/cartbonded/cartbonded\_whole\_pose\_module.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9jYXJ0Ym9uZGVkL2NhcnRib25kZWRfd2hvbGVfcG9zZV9tb2R1bGUucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/score/cartbonded/identification.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9jYXJ0Ym9uZGVkL2lkZW50aWZpY2F0aW9uLnB5) | `97.22% <100.00%> (ø)` | | | ... and [11 more](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aleaverfay commented 1 year ago

I think after you merge trunk into this branch and resolve the conflicts, that this PR is ready to be merged. 💪

jflat06 commented 1 year ago

I think after you merge trunk into this branch and resolve the conflicts, that this PR is ready to be merged. 💪

Yeah, I am having some issues after resolving the conflicts due to the patching system changes.

The first issue is that it was using the fullname to do lookups into the cartbonded DB, which was causing a keyerror since MET:nterm doesn't exist. I'm assuming we want to lookup from the basename on this? It should also probably not choke if that key doesn't exist.

The second issue is that the score is now wrong.

E        x: array([[ 37.7848],
E              [183.5777],
E              [ 50.5842],...
E        y: array([[310.2358  ],
E              [194.5125  ],
E              [ 50.5842  ],...

I'm not 100% sure what's going on with this yet, but I think it could be due to the atom ID tables doubling up on the terminal entries. https://github.com/uw-ipd/tmol/pull/260/files#r1322083779

aleaverfay commented 1 year ago

Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name

jflat06 commented 1 year ago

Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name

Ok, yeah. Frank and I talked about this and came up with a similar conclusion. However, my new baselines don't match the ones that Frank computed with his patching-aware cartbonded using the old implementation, so we weren't 100% sure which one was correct. Frank seemed to think it was his code that was at fault, though.

jflat06 commented 1 year ago

Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name

@aleaverfay

Quick question -

Right now I am adding ALL atoms under the base type name as they come in. So even if you see a variant type first, it will add the atoms then, including any variant-added atoms. Any atoms removed in the variant would be added when you ran the base type.

This has the side effect that some of the variant-only atoms get tagged under the base name. This could cause collisions if you had multiple variant types that add an atom with the same name. But that shouldn't matter if you aren't adding params that reference these atoms.

This is what I have implemented currently.

If we want to ONLY tag the atoms in the base type, it gets a bit trickier.

One option is to somehow figure out which atoms are in the base type as you parse the variant, and add them under the basename.

If you don't do this, you need to ensure that:

  1. The base type is in the RTS.
  2. The base type is annotated first (before the variant).

I don't know if those are safe assumptions.

Any thoughts on this?