mdtraj / mdtraj

An open library for the analysis of molecular dynamics trajectories
http://mdtraj.org
GNU Lesser General Public License v2.1
566 stars 272 forks source link

_AMINO_ACID_CODES lacks ASH #1854

Open hsbyeon1 opened 5 months ago

hsbyeon1 commented 5 months ago

My trajectory contains ASH residues, which is protonated ASP for AMBER, albeit non-standard.

I guess Toplogy.select('protein') fails to parse atoms in such residues as protein, since __AMINO_ACID_CODES from mdtraj/core/residue_names.py lacks 'ASH' : 'D' , while it does contain non-standard notations such as 'GLH', 'HIH', etc...

So I suggest adding 'ASH' : 'D' if it does not make any problem.

sukritsingh commented 1 month ago

The naming conventions in residue_names.py follows the three letter codes defined by the PDB. The names used by Amber are nonstandard and conflict with the PDB definitions, so we use the PDB definitions in the Chemical Component dictionary for maximum compatibility with structural definitions.

I confess I'm not familiar with what ASH looks like myself and I cannot find it in the chemical component dictionary. is it something like ASP_LFZW?

Generally if you can define the residue name explicitly it in the topology you're providing during loading then you may be able to load it in fine? I haven't tested this much.

mattwthompson commented 1 month ago

I'm +1 for keeping these internal lists on standard definitions

sukritsingh commented 1 month ago

If there is a residue that is in the CCD but not in the internal lists (because it was obscure enough) then definitely happy to add it!