rcsb / symmetry

:ferris_wheel: Detect, analyze, and visualize protein symmetry
GNU Lesser General Public License v2.1
26 stars 16 forks source link

UNK residues #114

Open AntoniyaAleksandrova opened 2 years ago

AntoniyaAleksandrova commented 2 years ago

When there is a UNK residue in the protein structure (i.e. the residue is denoted as an "ATOM" entry in the pdb of the same chain as the rest of the protein), CE-Symm seems to treat it as non-existent and the produced FASTA alignments do not reflect its presence in the sequence. This can be somewhat confusing from a user's perspective. Since from what I understand only the position of the CA atom seems to matter for symmetry detection, is there a specific reason UNK residues are treated as non-existent and can they be added in?

sbliven commented 2 years ago

Do you have an example structure?

I don't know off-hand all the reasons UNK might be assigned, and whether there might be other cases where including them would be problematic.

AntoniyaAleksandrova commented 2 years ago

Hi Spencer, sorry for the delay in responding. For example, 6o84 has UNK residue entries in chain A and chain B. These entries start with "ATOM" and hence are considered part of the protein chain, even though the resolution might not be good enough to identify the specific amino acid. There are also cases where whole chains are represented with UNK residues - for example, chain X in 2axt. In the PDB sequence from the API, as well as other alignment software, these residues are often denoted with "X". In this resource, it is mentioned that UNK is usually reserved for unknown amino acids: https://www.wwpdb.org/documentation/file-format-content/format33/sect4.html. Does this help?