Closed JeffEdge closed 5 years ago
Thank you for your interest in my work. I'm glad it's (partially) helping you.
If I'm following your post correctly, you started with an input MS data file of some sort (another mzML file?) with only MS2 spectra, and attempting to export the processed spectra did not work as expected because the save_scan_bunch
method expected a precursor?
This sounds like a combination of things weren't aligned. The mzML format supports MS2-only files and the readers and writers in ms_deisotope
should too.
When you iterated over your input file, were you getting Scan
or ScanBunch
instances? A Scan
is a single scan/spectrum, while a ScanBunch
is a collection of an MS1 scan (which may be None
) and zero or more MSn scans. Since you have a snippet invoking bunch.deconvolute
, and only Scan
objects have a deconvolute
method, I assume bunch
is a Scan
. MzMLSerializer.save_scan_bunch
assumes its input is a ScanBunch
. To save a single Scan
, you should call MzMLSerializer.save_scan
, or you can use the MzMLSerializer.save
method, which does type checking to figure out which method to use. In the docs, I show an example using save
.
I probably need to document the iteration modes better, since it sounds like you've read some examples where iteration yields ScanBunch
instances, but that didn't match what you had.
The mzML snippet you posted appears to be missing the <precursorList>
element. Is it present in your input file? I can't tell if this is because you've changed the scan saving code, or because it is missing. if you can share the file, I might be able to better understand what's happening.
When you save a deconvoluted scan with ms_deisotope
's MzMLSerializer
, your <spectrum>
elements contain the following <binaryDataArray>
s:
ms_deisotope.output.text_utils
for the implementation:
def decode_envelopes(array):
envelope_list = []
current_envelope = []
i = 0
n = len(array)
while i < n:
a = array[i]
b = array[i + 1]
i += 2
if a == 0 and b == 0:
if current_envelope is not None:
if current_envelope:
envelope_list.append(Envelope(current_envelope))
current_envelope = []
else:
current_envelope.append(EnvelopePair(a, b))
envelope_list.append(Envelope(current_envelope))
return envelope_list
This lets me reconstruct DeconvolutedPeak
instances, preserving their envelope
attributes which is where the experimental isotopic patterns get stored. This hopefully takes care of encoding the isotopic patterns for you as well. Note, they are at whatever charge state the matching deconvoluted peak was fit at.
You said you'd prefer for the m/z stored to be de-charged, so transformed as if z = 1
. I don't have a method for doing that directly, but while I figure out if there is a better way, you could just do the following to achieve the desired effect:
import ms_deisotope
for peak in scan.deconvoluted_peak_set:
peak.mz = ms_deisotope.mass_charge_ratio(peak.neutral_mass, 1)
peak.charge = 1
scan.deconvoluted_peak_set.reindex()
Saving Spectra
-> Yes, I start with a mzML data file with only one MS2 spectrum, and indeed attempting to export the processed spectra did not work as expected presumably because the save_scan_bunch method expected a precursor.
“When you iterated over your input file, were you getting Scan or ScanBunch instances? A Scan is a single scan/spectrum, while a ScanBunch is a collection of an MS1 scan (which may be None) and zero or more MSn scans. Since you have a snippet invoking bunch.deconvolute, and only Scan objects have a deconvolute method, I assume bunch is a Scan.”
-> Yes, sorry for naming my scan so: I cleaned this up and now use MzMLSerializer.save_scan.
“The mzML snippet you posted appears to be missing the
-> Yes, it is present and reads as follows (the file is attached):
“I can't tell if this is because you've changed the scan saving code, or because it is missing. if you can share the file, I might be able to better understand what's happening.”
-> I join the file. I also went back to “originals”.
Using """ output to file """ with open("Output_deconvoluted_b.mzML", 'wb') as fh: writer = MzMLSerializer(fh, n_spectra=len(reader)) writer.copy_metadata_from(reader) writer.save_scan(scan) writer.close()
I get:
c:...\ms-deisotope\psims\psims\document.py:734: AmbiguousTermWarning: Multiple unit options are possible for parameter 'accuracy' but none were specified
self.write_params(xml_file)
Traceback (most recent call last):
File "C:/…/ms-deisotope/test04b.py", line 75, in
KeyError: 'selectedIonList'
The (graphic) output of your library is really great and I am tremendously enjoying myself using it as it can enhance our workflow a lot. Thanks a lot for the feedback,
Thank you for the response. Unfortunately, Github strips attachments from emails. I'll get a file request link set up later today.
So your mzML file does not list a <selectedIonList>
element, just an isolation window and an activation. It makes sense, given that your isolation window is immense, but the way I wrote things. I'll have to go about breaking this up downstream.
In the mean time, you can create a fake precursor:
import ms_deisotope
from ms_deisotope.data_source import PrecursorInformation, ChargeNotProvided
def make_fake_precursor(scan):
source = scan.source
isolation_window = scan.isolation_window
pinfo = PrecursorInformation(
isolation_window.target, 0, ChargeNotProvided, source=source, product_scan_id=scan.id)
scan.precursor_information = pinfo
return scan
Please upload the problematic mzML file here: https://www.dropbox.com/request/b36TB6MmPnb7lHcJhaQv
I've received your file. I'll get to work fixing the problem. I assumed that all precursors
would have selected ions
, but this is not true. I treated precursor_information
as if it were synonymous with selected ion
, so I'll have to decouple that idea, which may mean changing psims
too. Shouldn't take more than a day or two to make all the required changes.
Did the fake precursor workaround get you past the problem?
Nice, yes, this works: I do not get error messages. Thanks a lot for the changes this will be really helpful!
I've pushed commits to ms_deisotope
and psims
that should have fixed the problem. I was able to read the file back in using both ms_deisotope
and pyopenms
, though I think I need to upgrade pyopenms
to be able to read arrays other than m/z and intensity.
Using the updated modules, Error is:
Windows fatal exception: access violation
Current thread 0x00000100 (most recent call first):
File "C:\...\ms-deisotope\ms_deisotope\deconvolution\exhaustive.py", line 904 in deconvolute
File "C:\...\ms-deisotope\ms_deisotope\deconvolution\api.py", line 139 in deconvolute_peaks
File "C:\...\ms-deisotope\ms_deisotope\data_source\scan\scan.py", line 664 in deconvolute
File "C:/.../ms-deisotope/test04c.py", line 70 in <module>
am I doing something wrong?
In the meantime I'll continue using
def make_fake_precursor(scan):
to move forward with your
def decode_envelopes(array):
If you find a way to generate a deconvoluted spectrum as mzML format (MH+) I would be delighted! Thank you
That error message looks like a segfault is happening in the C-extensions related to solution graph construction and traversal in populate_graph
. Unfortunately, that function is large and complex, so I can't immediately tell where in there the error is occurring.
Are you using the same mzML file you sent me? Did you change any of the parameters, or adjust the scan's peak_set
attribute in any way? If the file is different, I will probably need to see it to debug the problem, I think the file upload link is still good.
By MH+, do you mean you want all charge states to be converted to 1+, but not merged? Do you want the envelopes array to be left as-is, or also have its charge adjusted?
Yes, I am using exactly the same input file, no parameter changed. I just commented out the "def make_fake_precursor(scan):" part in my script, that "solved" the issue with the previous version.
Leaving it in generates the same error.
By MH+ I mean all charge states to be converted to 1+ with their envelopes charge adjusted (converted to 1+) and the envelopes merged into a single mzML file. The goal would be to have a mzML file corresponding to a spectrum with all the ions being 1+. Basically the information that the signal is split in envelopes would be lost.
In other words a single mzml file with only MH+ ions, the spectrum described only by the two lines: m/z array - The monoisotopic m/z of each deconvoluted peak, given by their neutral mass and charge state intensity array - The sum of the intensity of all isotopic peaks underlying each deconvoluted peak
Okay. Sounds like there might actually be a linking/loading issue then. Could you please completely uninstall ms_deisotope
(pip uninstall ms_deisotope
until it can't find it) and re-install it from source?
I'll add a convenience function to transform a DeconvolutedPeak
with charge state z
and neutral mass m
into a DeconvolutedPeak
with charge state 1
and neutral mass m
, updating derived attributes, and a separate function to create a new PeakSet
which contains only isotopic peaks matched from DeconvolutedPeakSet
.
Sorry the last two lines should read m/z array - all the isotopic m/z of each charge deconvoluted peak, given by m/z corresponding to the charge state =1 intensity array - The intensities corresponding to each isotopic peaks underlying each charge deconvoluted distribution
After completely removing ms-deisotope, psims, and ms-peak-picker and reinstalling all three from source (see package list below), it works if I use def make_fake_precursor(scan): ...
If I omit this part I get the follwing error:
Traceback (most recent call last):
File ".\test04c.py", line 88, in <module>
writer.save_scan(scan)
File "c:\...\ms_deisotope\ms_deisotope\output\mzml.py", line 828, in save_scan
if scan.precursor_information:
File "c:\...\ms_deisotope\ms_deisotope\data_source\scan\scan.py", line 351, in precursor_information
self._data)
File "c:\...\ms_deisotope\ms_deisotope\data_source\mzml.py", line 127, in _precursor_information
pinfo_dict = self._get_selected_ion(scan)
File "c:\...\ms_deisotope\ms_deisotope\data_source\mzml.py", line 108, in _get_selected_ion
pinfo_dict = scan["precursorList"]['precursor'][0]["selectedIonList"]['selectedIon'][0]
KeyError: 'selectedIonList'
The package list used is:
Package Version Location
--------------------------- --------- ----------------------------------------------------------------
attrs 19.1.0
brain-isotopic-distribution 1.5.2
certifi 2019.6.16
chardet 3.0.4
comtypes 1.1.7
cycler 0.10.0
Cython 0.29.12
decorator 4.4.0
dill 0.3.0
idna 2.8
ipython-genutils 0.2.0
jsonschema 3.0.1
jupyter-core 4.5.0
kiwisolver 1.1.0
lxml 4.3.4
matplotlib 3.1.1
ms-deisotope 0.0.9 c:\...\ms_deisotope
ms-peak-picker 0.1.25 c:\...\ms_peak_picker
nbformat 4.4.0
numpy 1.16.4
pip 19.2.1
plotly 4.0.0
psims 0.1.28 c:\...\psims
pymzml 2.2.5
pynumpress 0.0.5
pyparsing 2.4.0
pyrsistent 0.15.3
pyteomics 4.1.2
python-dateutil 2.8.0
python-idzip 0.3.5
pythonnet 2.4.0
pytz 2019.1
regex 2019.6.8
requests 2.22.0
retrying 1.3.3
scipy 1.3.0
setuptools 41.0.1
six 1.12.0
SQLAlchemy 1.3.6
traitlets 4.3.2
urllib3 1.25.3
Hopes this helps?
That does help, thank you. This means the bug is in a totally different part of the code and doesn't need to be recompiled every time I make a change. I should be able to get to this some time tonight.
Thank you, much appreciated!
It looks like the error you're encountering was fixed in b4a3312. Do you still have the problem if you pull in the latest commits from master?
It is improving but I still get errors (not using the fake approach) ms-deisotope 0.0.9 c:...\ms_deisotope
c:\...\ms_deisotope\ms_deisotope\data_source\mzml.py:116: UserWarning: No selected ions were found for precursor
warnings.warn("No selected ions were found for precursor")
Traceback (most recent call last):
File ".\test04c.py", line 88, in <module>
writer.save_scan(scan)
File "c:\...\ms_deisotope\ms_deisotope\output\mzml.py", line 861, in save_scan
encoding=self.data_encoding)
File "c:\...\psims\psims\mzml\writer.py", line 548, in write_spectrum
intensity_unit=intensity_unit)
File "c:\....\psims\psims\mzml\writer.py", line 510, in spectrum
precursor_information, intensity_unit=intensity_unit)
File "c:\...\psims\psims\mzml\writer.py", line 657, in _prepare_precursor_list
intensity_unit=intensity_unit, **precursors)])
TypeError: _prepare_precursor_information() missing 1 required positional argument: 'intensity'
Did you update psims
? I made the selectedIon
-related parameters optional in psims/b6e34a1bad
After updating psims it works! thank you!
Just for info, warnings are: c:...\psims\psims\document.py:735: AmbiguousTermWarning: Multiple unit options are possible for parameter 'accuracy' but none were specified self.write_params(xml_file) c:...\ms_deisotope\ms_deisotope\data_source\mzml.py:116: UserWarning: No selected ions were found for precursor warnings.warn("No selected ions were found for precursor")
Thank you for debugging that with me.
The warning about accuracy
refers to the <cvParam />
with the name "accuracy" in your instrumentConfiguration
, which I don't think belongs there. Also, all of your detectors have a TOF path length parameter, which doesn't make sense for an Orbitrap instrument. I think one of your upstream tools added those, and when ms_deisotope
copies them into the new mzML file, psims
complains that it isn't complete.
To achieve your requested isotopic pattern-reduced centroided scan, the following snippet should do:
import ms_deisotope
from ms_deisotope.output import MzMLSerializer
from ms_peak_picker import simple_peak, PeakSet
...
scan = get_scan()
duplicate = scan.copy()
scan.pick_peaks().deconvolute(...)
isotopic_peaks = []
for peak in scan.deconvoluted_peak_set:
for point in peak.envelope:
mass = ms_deisotope.neutral_mass(point.mz, peak.charge)
mz = ms_deisotope.mass_charge_ratio(mass, 1)
isotopic_peaks.append(simple_peak(mz, point.intensity)
isotopic_peaks = PeakSet(isotopic_peaks)
isotopic_peaks.reindex()
duplicate.peak_set = isotopic_peaks
...
writer = MzMLSerializer(...)
with writer:
...
writer.save(duplicate, deconvoluted=False)
The deconvoluted=False
parameter of writer.save
tells the MzMLSerializer
that it should look at the peak_set
and not the deconvoluted_peak_set
attribute when getting the peak list to write.
Thank you for the snippet. A few questions: The is no need to update the modules as the last commit was 3 days ago. Correct? If I use:
import ms_deisotope
from ms_deisotope.test.common import datafile
from ms_deisotope.output.mzml import MzMLSerializer
from ms_peak_picker import simple_peak, PeakSet
from ms_deisotope.data_source import PrecursorInformation, ChargeNotProvided
reader = ms_deisotope.MSFileLoader(datafile("../../../../20190116_EMR1_22h02_Com0-300_part3.mzML"))
scan = next(reader)
#scan = get_scan() # I have not found a method called get_scan (there is one by index though)
duplicate = scan.copy()
scan.pick_peaks().deconvolute(averagine=ms_deisotope.peptide, scorer=ms_deisotope.PenalizedMSDeconVFitter(20., 2.0),
truncate_after=0.9, ignore_below=0.0, charge_range=(1, 13))
isotopic_peaks = []
for peak in scan.deconvoluted_peak_set:
for point in peak.envelope:
mass = ms_deisotope.neutral_mass(point.mz, peak.charge)
mz = ms_deisotope.mass_charge_ratio(mass, 1)
isotopic_peaks.append(simple_peak(mz, point.intensity))
isotopic_peaks = PeakSet(isotopic_peaks)
isotopic_peaks.reindex()
duplicate.peak_set = isotopic_peaks
""" output to file """
with open("Output_deconvoluted_b.mzML", 'wb') as fh:
writer = MzMLSerializer(fh, n_spectra=len(reader))
writer.copy_metadata_from(reader)
with writer:
writer.save(duplicate, deconvoluted=False)
writer.close()
I get:```
Traceback (most recent call last):
File ".\test06b.py", line 34, in
An suggestion about get_scan? and 'NoneType' object is not iterable ?
Do I need to update? is there an import missing?
Thank you!
Sorry. I apparently missed a code path controlled by the deconvoluted
flag. To make it work for you right now, when instantiating MzMLSerializer
, pass it deconvoluted=False
. This will make it skip building those extra arrays.
The deconvoluted
flag on the object controls other facets of how it would construct a <dataProcessing>
element, but this snippet doesn't call the relevant functions to add that anyway. The keyword argument on save
was supposed to make it possible to decouple the two parts.
Fantastic the file in output contains what I am looking for. I can work with that! There is still an issue with close though:
Traceback (most recent call last):
File ".\test06b.py", line 35, in <module>
writer.close()
File "c:\...\ms_deisotope\ms_deisotope\output\mzml.py", line 1004, in close
self.complete()
File "c:\...\ms_deisotope\ms_deisotope\output\mzml.py", line 974, in complete
self._spectrum_list_tag.__exit__(None, None, None)
File "c:\...\psims\psims\mzml\writer.py", line 72, in __exit__
self.writer.flush()
File "c:\...\psims\psims\xml.py", line 911, in flush
self.writer.flush()
File "src\lxml\serializer.pxi", line 1242, in lxml.etree._IncrementalFileWriter.flush
AssertionError
This is not critical but if you have time to give it a look that would help making the code more complex :-)
I must have read your earlier post in a hurry because I missed half of it.
get_scan
was meant to be "do whatever it is you do to get the Scan object you want", so that I didn't have to type out opening an imaginary file and getting a particular scan.
I saw you calling datafile
. That's not necessary. datafile
is a convenience function for testing that automatically resolves the path to that file in ms_deistope/test/test_data/
given a file name. It has no practical use in real code.
You have a double-close in your mzML writing code. The MzMLSerializer
class, like other file writers, can be used as a context manager that will close the file on __exit__
. You shouldn't call close
if you used with
.
Thank you I will try that! It may take a week though as I am abroad.
From: Joshua Klein notifications@github.com Sent: Saturday, 27 July 2019 00:46:37 To: mobiusklein/ms_deisotope ms_deisotope@noreply.github.com Cc: Greisch, J. (Jean-Francois) j.greisch@uu.nl; Author author@noreply.github.com Subject: Re: [mobiusklein/ms_deisotope] Exporting deconvoluted peak lists from a standalone MS2 scan (#13)
I must have read your earlier post in a hurry because I missed half of it.
get_scan was meant to be "do whatever it is you do to get the Scan object you want", so that I didn't have to type out opening an imaginary file and getting a particular scan.
I saw you calling datafile. That's not necessary. datafile is a convenience function for testing that automatically resolves the path to that file in ms_deistope/test/test_data/ given a file name. It has no practical use in real code.
You have a double-close in your mzML writing code. The MzMLSerializer class, like other file writers, can be used as a context manager that will close the file on exit. You shouldn't call close if you used with.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mobiusklein/ms_deisotope/issues/13?email_source=notifications&email_token=AMVVYNPQQNA4JJACBEJVYI3QBN5E3A5CNFSM4IFW2YA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD254CJA#issuecomment-515621156, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMVVYNLYPMVQIY7D6JEJDWDQBN5E3ANCNFSM4IFW2YAQ.
Understood. Thank you for your patience during the debugging process.
Just checking in to see if this issue was resolved. Please let me know if you ran into any other issues or if we were able to solve your problem.
I am in the final stages of checking ☺ I will get back to you by the end of the week.
Great. Thank you. Hopefully it works.
As far as I can judge, it works perfect! Thank you very much!
Is this in the deconvolution result or after the extra step to remove the charge state? An example would be helpful.
On Mon, Aug 26, 2019 at 12:30 PM JeffEdge notifications@github.com wrote:
Closed #13 https://github.com/mobiusklein/ms_deisotope/issues/13.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mobiusklein/ms_deisotope/issues/13?email_source=notifications&email_token=AAK4E6NEXRT2KAHCAGBLWCLQGQAI5A5CNFSM4IFW2YA2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOTIEEJOQ#event-2584233146, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK4E6JWWCUIPQP5VXN432DQGQAI5ANCNFSM4IFW2YAQ .
Sorry, I removed the comment as I now believe it to be related to noise or a misidentified charge state. Usually I perform a baseline subtraction. In a few cases this leaves me with a small area where the deconvolution identifies two overlapping isotopic distributions. Although I had this for several spectra, I now believe it to be an artefact (possibly related to a higher charge state) Should I be wrong and occurrences appear for intense peaks I will open a new call. My solution for this is currently to exclude overlapping distributions in the output but I am not sure this is the best way to proceed. Thank you.
From: Joshua Klein notifications@github.com Sent: Tuesday, August 27, 2019 12:15 AM To: mobiusklein/ms_deisotope ms_deisotope@noreply.github.com Cc: Greisch, J. (Jean-Francois) j.greisch@uu.nl; State change state_change@noreply.github.com Subject: Re: [mobiusklein/ms_deisotope] Exporting deconvoluted peak lists from a standalone MS2 scan (#13)
Is this in the deconvolution result or after the extra step to remove the charge state? An example would be helpful.
On Mon, Aug 26, 2019 at 12:30 PM JeffEdge notifications@github.com<mailto:notifications@github.com> wrote:
Closed #13 https://github.com/mobiusklein/ms_deisotope/issues/13.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mobiusklein/ms_deisotope/issues/13?email_source=notifications&email_token=AAK4E6NEXRT2KAHCAGBLWCLQGQAI5A5CNFSM4IFW2YA2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOTIEEJOQ#event-2584233146, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK4E6JWWCUIPQP5VXN432DQGQAI5ANCNFSM4IFW2YAQ .
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/mobiusklein/ms_deisotope/issues/13?email_source=notifications&email_token=AMVVYNOOGEDT2BIDTBIZML3QGRIVJA5CNFSM4IFW2YA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5F3CHQ#issuecomment-525054238, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMVVYNME75UGLVVEO4RUVM3QGRIVJANCNFSM4IFW2YAQ.
Thank you for this fantastic library. Briefly what works:
What I have issues with is exporting a mzML file of the deconvoluted data. While my knowledge of mzML is limited I tried the following to address the fact that I have no MS1 scan and no precursor list (the graphical output seems to work so I focused on the writer part)
Where save_scan_bunch2() is a method where I edited "precursor" out:
In save_scan2(), I just commented out the bloc staring with
if scan.precursor_information:
Now my problems really start: The edited add_scan_bunch() so that add_scan_bunch2() looks like
All this yields a mzML file which displays using openMS as centroided deisotope peaks ( nothing else), see file at the end.
First I believe I might have an indexation problem but I am not sure. Could it be that the issue stems from the binaryDataArrayList breaking the data into 5 parts.
Ultimately what I would like is export a charge deconvoluted spectrum (everything converted to MH+ ions, no higher charge states) and if possible a list of the isotope distributions extracted (ideally MH+) also. Perhaps it would be possible to create a new method to export an extracted isotope distribution and find a way to append them in a single mzML file?
Any help would be very much appreciated! Many thanks, With my best wishes, Jeff
If it is any help I can provide a MS2 single scan data file. The output written out is below.