levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

can't get m/z array and intensity array #65

Closed morestart closed 2 years ago

morestart commented 2 years ago

this is my code

for spectrum in mzml.read(self.data, use_index=True):
    self.process_time.append(spectrum.get('scanList').get('scan')[0].get('scan start time'))
    self.mz.append(spectrum.get('m/z array'))
    self.intensity.append(spectrum.get('intensity array'))

m/z and intensity is empty, why? image

xml version: version="1.0" system: macos python: 3.8

mobiusklein commented 2 years ago

If your mzML file was already centroided, it's not uncommon for some spectra, especially early in an acquisition, to contain no peaks above whatever noise threshold the peak picker sets. Are all the spectra empty?

morestart commented 2 years ago

if i use this code , i can get need data:

from pyteomics import mzml

data = mzml.read('/Users/cattree/PycharmProjects/BMIProject/data/test.mzML', use_index=True)
print(data.get_by_index(3890))

{'index': 3890, 'id': 'controllerType=0 controllerNumber=1 scan=3891', 'defaultArrayLength': 1763, 'scanList': {'count': 1, 'scan': [{'scanWindowList': {'count': 1, 'scanWindow': [{'scan window lower limit': 50.0, 'scan window upper limit': 2000.0}]}, 'scan start time': 14.626956666667, 'filter string': 'ITMS + c ESI Full ms [50.00-2000.00]', 'preset scan configuration': 0.0, 'ion injection time': 0.327597945929}], 'no combination': ''}, 'MS1 spectrum': '', 'ms level': 1, 'positive scan': '', 'centroid spectrum': '', 'base peak m/z': 1371.603881835938, 'base peak intensity': 767264.375, 'total ion current': 73670352.0, 'lowest observed m/z': 96.687423706055, 'highest observed m/z': 1999.616577148438, 'spectrum title': 'LAMS-POS-JUN-1064nm-1HZ-200MJ-NANOESI-05ULMIN_01.3891.3891. File:"LAMS-POS-JUN-1064nm-1HZ-200MJ-NANOESI-05ULMIN_01.raw", NativeID:"controllerType=0 controllerNumber=1 scan=3891"', 'count': 2, 'm/z array': array([ 96.68742371, 106.71133423, 110.47019958, ..., 1997.41137695, 1998.47338867, 1999.61657715]), 'intensity array': array([ 3213.50585938, 4366.23095703, 4368.14160156, ..., 12838.35742188, 3873.36865234, 4008.85546875])}

mobiusklein commented 2 years ago

You should be able to look at your data structure from the first note and look at the 3890th entry in self.mz and self.intensity and see the same arrays shown in the result of data.get_by_index(3890).

Looking at the scan data you just showed, these are already centroided ('centroid spectrum' key is present).

morestart commented 2 years ago

how can i get all index? self.intensity and self.mz is all empty like this : [array([], dtype=float64), array([], dtype=float64), array([], dtype=float64), array([], dtype=float64), array([], dtype=float64), array([], dtype=float64), array([], dtype=float64).....]

morestart commented 2 years ago

ok, i know why... we must use index=True... if dont use this param, spectrum.get('m/z array')) and spectrum.get('intensity array') will get None

mzml.read(file_path, use_index=True)

I hope this can be explained in the document 😊