levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

isoforms - partially undefined order of reported modified peptides #15

Closed lutzfischer closed 3 years ago

lutzfischer commented 3 years ago

Hi,

We I noticed a problem (at least for us) in how isoforms reports modified peptides. If there are two or more modifications defined that have the same/overlapping specificities, then the order in which these get applied to any given residue is undefined. E.g. given the peptide XXXKXXXX and two variable modification, a and b, that could happen on K the isoform-function will sometimes yields XXXaKXXXX, XXXbKXXXX and sometimes XXXbKXXXX, XXXaKXXXX (only differs when the python interpreter is restarted in between - just running it twice directly one after the other yields the same order) . On itself this is not a big problem - but we have to cap the total number of reported modified peptides. And having a non defined order then leads to different subsets of modified peptides being used.

I suspect the problem comes from the use of sets for encoding the possible modifications on each residue of a peptide.

levitsky commented 3 years ago

Thanks for reporting. I have reproduced the issue and will shortly look into ensuring a fixed order of output.

levitsky commented 3 years ago

@lutzfischer I got rid of sets and also sorted the variable_mods dict just in case, so the order should now be more stable. Could you please check the latest version in master?

lars-kolbowski commented 3 years ago

Hi, I checked out current master and tried it. The output is now reproducible, but it looks like you introduced a bug concerning terminal modifications with these changes. When running e.g.: parser.isoforms('AAA', fixed_mods={'n-': True, '-c': True}) I now get:

  File ".../pyteomics/parser.py", line 838, in isoforms
    parsed[i] = apply_mod(group, cmod)
  File ".../pyteomics/parser.py", line 795, in apply_mod
    group = list(label)
TypeError: 'NoneType' object is not iterable
levitsky commented 3 years ago

Ah, thanks for the catch! Indeed, I broke the fixed modifications with my change. They should be fixed now (haha).

levitsky commented 3 years ago

Hopefully this is resolved now, please report any problems with new isoforms.

lars-kolbowski commented 3 years ago

yep, works fine now :+1: Thank you!