t7morgen / misato-dataset

GNU Lesser General Public License v2.1
172 stars 17 forks source link

Question about the dataset #1

Closed YanjingLiLi closed 1 year ago

YanjingLiLi commented 1 year ago

Hi, I have a question about the dataset. For MD datset, could you provide an explanation on what each property mean? (eg. ‘atoms_element’, ‘atoms_number’, ‘atoms_residue’, ‘atoms_type’, ‘molecules_begin_atom_index’)

t7morgen commented 1 year ago

Hi, you can find more information on this in src/data/processing/Maps/ . You will find the dictionary for conversion of the features to the respective amber names. Molecules_begin_atom_index basically reflects the chains you have in the proteins (TER cards in the pdb file). Atoms properties are converted from AMBER properties (the naming is according to AMBER ff14SB force field for the proteins and gaff2 force field for the ligands). You can find the gaff2 specifications here: https://github.com/choderalab/ambermini/blob/master/share/amber/dat/leap/parm/gaff2.dat