uw-ipd / tmol

TMol
Apache License 2.0
30 stars 3 forks source link

Aleaverfay/pose building interface #257

Closed aleaverfay closed 11 months ago

aleaverfay commented 1 year ago

This PR creates an interface to tmol so that NN packages can output coordinates and residue-type information for run-of-the-mill proteins in a canonical form and then construct a PoseStack object needed for score function evaluation. There are a few features/drawbacks here worth mentioning:

1) chemical type resolution for HIS and CYS is handled by this code 2) if hydrogen atoms are not provided, their positions can be computed 3) termini variants are not yet handled 4) the HIS and CYS chemical resolution steps are handled on the CPU and will later become faster

Details: HIS chemical type resolution: Histadine (at neutral pH) exists in one of two tautomerization states (90% of the time): 1) where the NE2 nitrogen is protonated and the ND1 nitrogen is not, and 2) where the ND1 nitrogen is protonated and the NE2 nitrogen is not. There are four possible ways that tmol will resolve this ambiguity: 1) the user provides either the HD1 or the HE2 hydrogen coordinates (but not both) and tmol selects the corresponding tautomerization state 2) the user provides a coordinate for "HN," a stand in for the hydrogen that is chemically bound to one of the HIS's two nitrogens; tmol will use the distance of HN to ND1 and to NE2 to and will declare the tautomerization state based on the shorter of the two 3) the user provides the coordinates for three atoms "HN", "NH" and "NN" which are stand-ins for the hydrogen attached to the nitrogen, the nitrogen attached to the hydrogen, and the nitrogen that is not attached to the hydrogen. The distances between NH and NN and the ring's CG atom are used to decide which of the two tautomerization states the ring is in. 4) the user provides only the heavy-atom coordinates, in which case the code chooses the NE2-is-protonated tautomerization state.

CYS disulfide detection: Disulfides are detected by distance between SG atoms with a maximum detection distance of 2.5A. The algorithm is greedy: cysteine residues are iterated across in order in an outer loop (i) and again in an inner loop (j). The closest distance between SG on CYS i and any other as-of-yet-unpaired CYS j within 2.5A is found and a disulfide bond declared between i and j. This does not guarantee that the largest number of possible disulfides is found; if res 1 and 3 have a distance of 2.2A and 1 and 2 has 2.4A, and 3 and 4 have a distance of 2.4A, e.g. then the largest number of disulfides would be formed when 1 and 2 are paired and 3 and 4 are paired. This algorithm, however, will pair residues 1 and 3. Also of note: if the order of the residues changes, but their distances do not, so that 1 and 2 have a distance of 2.4A and then 2 and 3 have the distance of 2.2A, (and cysteines 3 and 4 continue to have a distance of 2.4A) and for sake of argument the other pairs have distances > 2.5, then the 1-2 disulfide will be found first, and therefore the 3-4 disulfide will also be found later.

It is currently not possible to tell tmol which pairs of CYS residues should be disulfide bonded, but will be in the future.

H-placement: If hydrogen atoms are missing, they will be built from ideal coordinates wrt heavy atoms. This is a fully-differentiable process so derivatives wrt the score function that apply to the hydrogen atoms get converted into derivatives for the atoms that define their coordinates. In the case of HIS, the NE2-protonated tautomerization state is chosen; in the case of SER, THR, CYS, and TYR, the hydroxyl's dihedral angle is assigned to 180. Future versions of this code will make more intelligent decisions on how to choose the dihedral angle, e.g. by looking for nearby hydrogen bond acceptors.

Currently there is no way to build missing heavy atom coordinates, even if their position could be inferred from other heavy atom coordinates, e.g. for backbone oxygen atoms.

codecov[bot] commented 1 year ago

Codecov Report

Attention: 30 lines in your changes are missing coverage. Please review.

Comparison is base (b8940ef) 95.13% compared to head (bedc49a) 94.86%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #257 +/- ## ========================================== - Coverage 95.13% 94.86% -0.28% ========================================== Files 340 365 +25 Lines 21546 23486 +1940 ========================================== + Hits 20497 22279 +1782 - Misses 1049 1207 +158 ``` | [Flag](https://app.codecov.io/gh/uw-ipd/tmol/pull/257/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | Coverage Δ | | |---|---|---| | [_shrug_Testing_CPU](https://app.codecov.io/gh/uw-ipd/tmol/pull/257/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `89.87% <95.08%> (-0.12%)` | :arrow_down: | | [_shrug_Testing_CPU_w_o_jit](https://app.codecov.io/gh/uw-ipd/tmol/pull/257/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `91.71% <96.92%> (-0.11%)` | :arrow_down: | | [_shrug_Testing_CUDA](https://app.codecov.io/gh/uw-ipd/tmol/pull/257/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | `92.26% <96.67%> (-0.21%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | Coverage Δ | | |---|---|---| | [tmol/\_\_init\_\_.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9fX2luaXRfXy5weQ==) | `100.00% <100.00%> (ø)` | | | [tmol/chemical/restypes.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9jaGVtaWNhbC9yZXN0eXBlcy5weQ==) | `94.44% <100.00%> (+0.39%)` | :arrow_up: | | [tmol/io/details/canonical\_packed\_block\_types.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9pby9kZXRhaWxzL2Nhbm9uaWNhbF9wYWNrZWRfYmxvY2tfdHlwZXMucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/io/details/compiled/compiled.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9pby9kZXRhaWxzL2NvbXBpbGVkL2NvbXBpbGVkLnB5) | `100.00% <100.00%> (ø)` | | | [tmol/io/details/his\_taut\_resolution.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9pby9kZXRhaWxzL2hpc190YXV0X3Jlc29sdXRpb24ucHk=) | `100.00% <100.00%> (ø)` | | | [tmol/io/details/left\_justify\_canonical\_form.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9pby9kZXRhaWxzL2xlZnRfanVzdGlmeV9jYW5vbmljYWxfZm9ybS5weQ==) | `100.00% <100.00%> (ø)` | | | [tmol/pack/rotamer/build\_rotamers.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9wYWNrL3JvdGFtZXIvYnVpbGRfcm90YW1lcnMucHk=) | `100.00% <ø> (ø)` | | | [tmol/pose/packed\_block\_types.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9wb3NlL3BhY2tlZF9ibG9ja190eXBlcy5weQ==) | `99.20% <100.00%> (+0.09%)` | :arrow_up: | | [tmol/pose/pose\_stack\_builder.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9wb3NlL3Bvc2Vfc3RhY2tfYnVpbGRlci5weQ==) | `97.61% <100.00%> (+0.69%)` | :arrow_up: | | [tmol/score/\_\_init\_\_.py](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd#diff-dG1vbC9zY29yZS9fX2luaXRfXy5weQ==) | `100.00% <100.00%> (ø)` | | | ... and [57 more](https://app.codecov.io/gh/uw-ipd/tmol/pull/257?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd) | | ... and [6 files with indirect coverage changes](https://app.codecov.io/gh/uw-ipd/tmol/pull/257/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uw-ipd)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aleaverfay commented 11 months ago

FWIW, the "x" that's on codecov is from my disabling the unit testing of the packer / simulated annealing code. That code is not exactly dead, but it needs a lot of work before it's useful. The other code in the PR is pretty well covered.