Closed jmueller95 closed 3 years ago
Thank you for this very nice contribution! I'll be happy to merge it after some minor wrinkles are ironed out, which I am happy to assist with. I have pushed a commit where I addressed most of the things I noticed, the main one being that mgf.write
wasn't quite working with ions. I also added a test for that, and changed a print()
call to warn()
in _parse_ion
. Please take a look and let me know if you're OK with these changes.
Apart from that, I have just a couple of questions left:
pyteomics/auxiliary/file_helpers.py
(explicitly converting the element ID to str
). Why was it necessary? It's kind of at the heart of all the indexed parser functionality so I'm extra cautious about changing anything there.This is probably minor, but the changes do break previously pickled objects since it changes the signature of __init__
. To preserve this, just move the read_ions
parameter to the end of the parameter list of __init__
and get/set it in the __getstate__
/__setstate__
methods instead of in __reduce_ex__
. These objects probably shouldn't be serialized for long-term storage though.
Thank you for the comment @mobiusklein. I'm not sure I like the idea of permanently changing the code for a one-time reason, so it got me thinking if I could write a one-off script to convert the pickled data based on the latest version of the code: basically copy the old definitions into the script, patch them onto the actual module, read in the data, then dump it to a new file.
I figured I could set self._read_ions = False
in the patched class's __init__
and then change self.__class__
to mgf.IndexedMGF
(i.e. new class) in __reduce__ex__
. Do you think it's a viable approach?
I'm not sure what to do about plain MGF
though. Upon a closer look it doesn't appear to be picklable in any practical sense anyway.
Here's a version that only deals with IndexedMGF
in pickled data: https://pastebin.com/s0D40Na3
Thank you very much for your comments! I implemented the changes you suggested, including those by @mobiusklein (you can decide if you want to keep them). As you already figured @levitsky , pDeep has annotations for all its peaks, so masked arrays aren't necessary - I haven't seen any files where the annotations were only present partially.
Ah, and the change in pyteomics/auxiliary/file_helpers.py
turns out not to be necessary anymore, so I reverted it to the original version. Thanks for pointing that out!
Thank you, that fixes my concern.
OK, thank you all!
I've reverted another addition of str()
conversion in get_by_id
, this one inside mgf.py
. Hopefully it's not going to be a problem; at least in the tests, it is not needed.
Thanks, no this is not a problem for my use case either.
This PR extends the MGF module so that MGFs with annotated mass spectra can be parsed (for an example, see test_annotated.mgf ). These files are e.g. output by the MS/MS prediction tool pDeep (https://github.com/pFindStudio/pDeep). The new feature can be switched on by setting
read_ions=True
in themgf.read()
routine.