Residue centric cartbonded

jflat06 commented 1 year ago

This PR reworks cartbonded to use the new res-centric paradigm.

The overall flow of the calculation is as follows:

Generate a list of angles/lengths/torsions (subgraphs) for each block type.
Read the parameters from the cartbonded database and associate them with the block types.
Pack all of these together into tensors. The subgraphs are simply concatenated while the parameters are stuffed into a hash table with the keys being tuples of unique atom ids, and the values being the parameter values.
Pass this off to the C++.
For each residue, iterate over all subgraphs (angles/lengths/torsions) for that block type, get the unique atom ids of the atoms involved, and do a lookup in the hash table to see if there is a set of parameters for that group of atoms. If there is, accumulate it onto the result.
Also iterate over the inter-residue subgraphs and accumulate the results of any matching set of atoms.

To accomplish the above, several systems were added:

atom_unique_ids were added to the AtomTypeDependentTerm. AtomTypeDependentTerm iterates over all block types and generates a table with entries for the res-type specific atom id (ALA:C) and the wildcard variant (WILDCARD:C). This way every atom has a unique ID in a global indexing scheme across all atoms in the pose. These unique ids are used to form the keys in our hash table, which lets us do lookups in the C++ code based on the types of the atoms involved in a particular subgraph.
atom_paths_from_conn were added to RefinedResidueType. This is similar to the existing atom_downstream_of_conn, except that it provides the full graph coming from that connection rather than just the mainline atoms. This is needed because some of our cartbonded terms apply to residue-spanning torsions that involve non-mainline atoms.

Lastly, I changed the database format. I have changed the old database to have an "old" suffix. It should be trivial to remove when we remove non-res-centric code.

jflat06 commented 1 year ago

A design question for either of you -

Right now I kept the atom identifiers in the yaml file the same - they just use the generic atom names. This means that I have to do the work to make the unique (res-specific) atom IDs and the wildcard IDs. It also means that I have to deduce when a param crosses the residue boundary. (C-N)

One could imagine an alternative where we have a way of specifying the unique atom IDs directly (or wildcard ID, if you wanted it to remain a wildcard). This would be a bit more explicit, and you could essentially load the params directly from the DB into the hash table. I'm not sure whether this is actually better or worth the effort to re-write it, though.

jflat06 commented 1 year ago

@aleaverfay this is passing all tests now, if you want to take a look.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.10% :tada:

Comparison is base (056044f) 95.03% compared to head (7375f70) 95.13%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #260 +/- ## ========================================== + Coverage 95.03% 95.13% +0.10% ========================================== Files 335 340 +5 Lines 21107 21546 +439 ========================================== + Hits 20058 20497 +439 Misses 1049 1049 ``` | [Flag](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | Coverage Δ | | |---|---|---| | [_shrug_Testing_CPU](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `89.99% <99.55%> (+0.19%)` | :arrow_up: | | [_shrug_Testing_CPU_w_o_jit](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `91.82% <99.55%> (+0.16%)` | :arrow_up: | | [_shrug_Testing_CUDA](https://app.codecov.io/gh/uw-ipd/tmol/pull/260/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `92.47% <100.00%> (+0.15%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files Changed](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | Coverage Δ | | |---|---|---| | [tmol/tests/score/cartbonded/test\_script\_modules.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC90ZXN0cy9zY29yZS9jYXJ0Ym9uZGVkL3Rlc3Rfc2NyaXB0X21vZHVsZXMucHk=) | `100.00% <ø> (ø)` | | | [tmol/chemical/constants.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9jaGVtaWNhbC9jb25zdGFudHMucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/chemical/restypes.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9jaGVtaWNhbC9yZXN0eXBlcy5weQ==) | `94.05% <100.00%> (+0.83%)` | :arrow_up: | | [tmol/database/scoring/\_\_init\_\_.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9kYXRhYmFzZS9zY29yaW5nL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [tmol/database/scoring/cartbonded.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9kYXRhYmFzZS9zY29yaW5nL2NhcnRib25kZWQucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/pose/packed\_block\_types.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9wb3NlL3BhY2tlZF9ibG9ja190eXBlcy5weQ==) | `99.10% <100.00%> (+0.10%)` | :arrow_up: | | [tmol/score/atom\_type\_dependent\_term.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9hdG9tX3R5cGVfZGVwZW5kZW50X3Rlcm0ucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/score/cartbonded/cartbonded\_energy\_term.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9jYXJ0Ym9uZGVkL2NhcnRib25kZWRfZW5lcmd5X3Rlcm0ucHk=) | `100.00% <100.00%> (ø)` | | | [...l/score/cartbonded/cartbonded\_whole\_pose\_module.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9jYXJ0Ym9uZGVkL2NhcnRib25kZWRfd2hvbGVfcG9zZV9tb2R1bGUucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/score/cartbonded/identification.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9jYXJ0Ym9uZGVkL2lkZW50aWZpY2F0aW9uLnB5) | `97.22% <100.00%> (ø)` | | | ... and [11 more](https://app.codecov.io/gh/uw-ipd/tmol/pull/260?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aleaverfay commented 1 year ago

I think after you merge trunk into this branch and resolve the conflicts, that this PR is ready to be merged. 💪

jflat06 commented 1 year ago

I think after you merge trunk into this branch and resolve the conflicts, that this PR is ready to be merged. 💪

Yeah, I am having some issues after resolving the conflicts due to the patching system changes.

The first issue is that it was using the fullname to do lookups into the cartbonded DB, which was causing a keyerror since MET:nterm doesn't exist. I'm assuming we want to lookup from the basename on this? It should also probably not choke if that key doesn't exist.

The second issue is that the score is now wrong.

E        x: array([[ 37.7848],
E              [183.5777],
E              [ 50.5842],...
E        y: array([[310.2358  ],
E              [194.5125  ],
E              [ 50.5842  ],...

I'm not 100% sure what's going on with this yet, but I think it could be due to the atom ID tables doubling up on the terminal entries. https://github.com/uw-ipd/tmol/pull/260/files#r1322083779

aleaverfay commented 1 year ago

Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name

jflat06 commented 1 year ago

Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name

Ok, yeah. Frank and I talked about this and came up with a similar conclusion. However, my new baselines don't match the ones that Frank computed with his patching-aware cartbonded using the old implementation, so we weren't 100% sure which one was correct. Frank seemed to think it was his code that was at fault, though.

jflat06 commented 1 year ago

Yes, it took me a little while to hunt it down, but the logic in Rosetta3 is to use the base_name to do the parameter lookups. I think the only atoms you should insert into the table are for the un-patched residue types; i.e. name == base_name

@aleaverfay

Quick question -

Right now I am adding ALL atoms under the base type name as they come in. So even if you see a variant type first, it will add the atoms then, including any variant-added atoms. Any atoms removed in the variant would be added when you ran the base type.

This has the side effect that some of the variant-only atoms get tagged under the base name. This could cause collisions if you had multiple variant types that add an atom with the same name. But that shouldn't matter if you aren't adding params that reference these atoms.

This is what I have implemented currently.

If we want to ONLY tag the atoms in the base type, it gets a bit trickier.

One option is to somehow figure out which atoms are in the base type as you parse the variant, and add them under the basename.

If you don't do this, you need to ensure that:

The base type is in the RTS.
The base type is annotated first (before the variant).

I don't know if those are safe assumptions.

Any thoughts on this?

uw-ipd / tmol

Residue centric cartbonded #260

Codecov Report