Closed madani-sf closed 4 years ago
Sure! The mapping from amino acid to index is given by PFAM_VOCAB
https://github.com/songlab-cal/tape/blob/master/tape/data_utils/vocabs.py#L1
The 1 letter codes follow the standard IUPAC convention https://www.bioinformatics.org/sms2/iupac.html
One thing we could add as a note - if you google IUPAC codes, the first thing that comes up (for me at least) is https://www.bioinformatics.org/sms/iupac.html, which is version 1 of the site, not the updated version 2 (which has different codes). Version 1 only shows codes for the 20 standard amino acids.
Can the authors provide the mapping from index number in raw data to three letter amino acid names?
I'm assuming it is alphabetical starting from 'A'-> 4 (skipping the letter 'J'). in addition to ordering, please give clarification on full amino acid name