Closed aleaverfay closed 8 months ago
Attention: 17 lines
in your changes are missing coverage. Please review.
Comparison is base (
ace29a8
) 94.93% compared to head (5e6c2e5
) 95.26%. Report is 7 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This PR introduces an interface layer between tmol and other molecular modeling packages so that coordinates generated from one can be translated into a meaningful representation for the other. The principle mediator of this interface is the
CanonicalOrdering
class. This class is constructed with a set of allowableRefinedResidueType
(RRT) objects and groups them together based on their "name3." The CanonicalOrdering object then collects the names of all atoms for all of the RRTs with the same name3 and gives an order for those atoms. (E.g. for "ALA", "N" might be atom 0 and "CA" might be atom 1, etc.). This allows the user to create acoords
tensor of [n-poses x max-n-residues x max-n-canonical-atoms x 3] and to populate that tensor in a way that tmol will be able to interpret. If the user wants to provide a "HG" for SER (recommended!), then the CanonicalOrdering will tell the user where to put the HG's coordinate in thatcoords
tensor.With the CanonicalOrdering in hand, the user is able to construct an intermediate representation that tmol will use to construct its PoseStack object; this intermediate representation is called the "canonical form," a dictionary that will contain at least three things:
[n-poses x max-n-residue]
tensor of int32s describing the name3 class for each residue in each Pose (with sentinel values of -1 designating place-holder residues); the integer value here is in reference to the index given by the CanonicalOrdering for the desired name3.[n-poses x max-n-residues]
tensor describing which chain each residue belongs to; if residue i and residue i+1 are labeled as part of the same chain, and they are both polymeric residue types, then a chemical bond between their "down" and "up" connection points will be included.[n-poses x max-n-residues x max-n-canonical-atoms x 3]
tensor describing the coordinate of each atom; any atom's coordinate that is not being provided should be given as NaN, and tmol will build the coordinates it can and complain loudly if a coordinate it requires has not been providedThis "canonical form" representation is intended to be a useful, stable intermediate representation for structures so that they may be serialized to disk (using
torch.save
, e.g.). Clearly stability here requires that a CanonicalOrdering class that is constructed today to give meaning to the indices in the "res_types" tensor and meaning to the positions within the "coords" tensor must be guaranteed identical to the CanonicalOrdering class that is constructed 6 months from now when tmol supports new exotic chemical types. Thus the purpose of the CanonicalOrdering is to allow the user to control exactly which residue types are in its purview.From a "canonical form," the API-function
pose_stack_from_canonical_form
can be invoked. This function's arguments are changing somewhat significantly in this PR. In particular, the argument "atom_is_present" is no longer accepted; if an atom is present, then its coordinate will be non-NaN and if an atom is absent, then its coordinate will be NaN. This PR requires two more required arguments: aCanonicalOrdering
object and aPackedBlockTypes
object. This PR also makes the other arguments topose_stack_from_canonical_form
keyword-only. The function to "deconstruct" a PoseStack back into its "canonical form," returns a dictionary describing the rest of the argumentspose_stack_from_canonical_form
including the "don't add termini variants to certain residues / don't declare chemical bonds between certain residue pairs" argument "res_not_connected" and the "here's a list of the cystein residues that are disulfide-bonded" argument "disulfides." That way a PoseStack can be deconstructed to a canonical form object and then restored to exactly that same PoseStack without having to provide extra arguments.This PR introduces several API-level functions for converting between popular NN atom ordering conventions, in particular, OpenFold and RosettaFold2. These API functions allow direct creation of PoseStack objects from the outputs generated by these NNs, and also allow creation of "canonical form" dictionaries and the stable CanonicalOrderings that make these dictionaries interpretable.