levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

C-terminal modifications in calculate_mass #123

Closed JB91451 closed 9 months ago

JB91451 commented 9 months ago

Dear all,

I currently try to write .msp spectral library files from scratch and use the mass.calculate_mass() function to calculate precursor masses from peptide sequences. As the peptides contain variable methionine oxidation as well as fixed carbamidomethylation, I use the following line to modify the mass lists:

# m is oxidized methionine
mass.std_aa_comp['m'] = mass.Composition({'H': 9, 'C': 5, 'S': 1, 'O': 2, 'N': 1})
# C is always CAM
mass.std_aa_comp['C'] = mass.Composition({'H': 8, 'C': 5, 'S': 1, 'O': 2, 'N': 2})

mass_list = []
for pep_seq in in_df['seq]:
    mass_list.append(mass.calculate_mass(sequence=pep_seq))

However, I noticed that this fails with the following error message if the modified amino acid is located C-terminally:

Traceback (most recent call last): File "C:\Tmp_Data\Prosit_SpLib_Prediction.py", line 463, in main() File "C:\Tmp_Data\Prosit_SpLib_Prediction.py", line 427, in main results_df = add_mz_information(results_df) File "C:\Tmp_Data\Prosit_SpLib_Prediction.py", line 41, in add_mz_information mass_list.append(mass.calculate_mass(sequence=pep_seq)) File "C:\Programs\Python310\lib\site-packages\pyteomics\mass\mass.py", line 629, in calculate_mass composition = (Composition(kwargs['composition']) if 'composition' in kwargs else Composition(*args, **kwargs)) File "C:\Programs\Python310\lib\site-packages\pyteomics\mass\mass.py", line 292, in init getattr(self, 'from' + kwa)(kwargs[kwa], aa_comp) File "C:\Programs\Python310\lib\site-packages\pyteomics\mass\mass.py", line 203, in _from_sequence parsed_sequence = parser.parse( File "C:\Programs\Python310\lib\site-packages\pyteomics\parser.py", line 316, in parse raise PyteomicsError('Not a valid modX sequence: ' + sequence) pyteomics.auxiliary.structures.PyteomicsError: Pyteomics error, message: 'Not a valid modX sequence: ALIVLAHSERTSFNYAm'

Is there an alternative way to specify the required modifications or do you have any suggestion how to solve this?

Best regards, Juergen

levitsky commented 9 months ago

Hi Juegen!

The modX notation that Pyteomics uses implies that lowercase letters work as a prefix to the uppercase amino acid letter. That is to say, a modified methionine could be designated as oxM, or any other lowercase prefix followed by M.

You can put an entry for "oxM" into your aa_comp dict, or just for "ox" (in the latter case the value would be just {'O': 1}.

TLDR: Right now Pyteomics thinks that m is a modification, not a modified amino acid. You can fix this if you just substitute m by oxM in all sequences and in the aa_comp.

Hope this helps, I'll be glad to assist if you have further questions.

Best, Lev

JB91451 commented 9 months ago

Hi Lev,

Thank you for your help. Changing the m to oxM fixed the issue perfectly fine.

Best, Juergen