sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
177 stars 81 forks source link

Ways to extract all MS2 peaks? #737

Open jzhou19 opened 2 months ago

jzhou19 commented 2 months ago

Hello,

I had QE data which I wanted to extract MS2 information from. I tried to do so by featureSpectra(xdata) where xdata was an XcmsExperiment object I got by going through LC-MS preprocessing steps including peak detection, alignment, and correspondence. However, the returned object from featureSpectra contained columns like "basePeakMZ", "lowMZ', and "highMZ". Are there any ways to extract all MS2 peaks instead of only having the base peak information?

Thank you!

jorainer commented 2 months ago

The object returned by the featureSpectra should be a Spectra - and if that's the case, you can extract the individual peaks data with the peaksData() function, or also using the mz() and intensity() functions.

jzhou19 commented 2 months ago

Hi Johannes,

Thank you for your response. I should've stated this more clearly. Yes, the object returned by featureSpectra is a Spectra, but I extracted the information and converted it to a data frame by as.data.frame(ms2_spectra@backend@spectraData@listData), that's why I said columns which might be confusing.

I tried peaksData on the Spectra object and got all peaks information in a SimpleList object. However, I noticed some discrepancies between the data frame I created and the SimpleList. I attached two screenshots below. For example, in the data frame, for the first spectrum, I have 80 peaks with a base peak intensity of 11350.406, low mz of 70.29, and high mz of 545.44. When I inspected the peaks data, the lowest mz was 78.34 and the highest was 530.06, the most abundant peak is the same but the intensity is slightly different - 10921.4326. Could you explain why I am observing these differences?

df_firstrow_JZ

peaksdata_JZ

Thanks a lot!

jorainer commented 2 months ago

Firstly please use the dedicated functions to extract data from a Spectra and don't access slots (@) directly! To explain: Spectra can use different backends (MsBackend classes) to keep the data - each one will store the data in a different way, so the code you used will only work for one type of backend. And, more importantly, Spectra uses a lazy processing queue for many data manipulation operations, which means that the original data (m/z and intensity) don't get modified. The data modification gets applied once you access the data (using mz, intensity, spectraData or peaksData) - the way you accessed the data you will always get the original, unmodified, unfiltered data.

For your question: the information on "totIonCurrent", "basePeakMZ", "basePeakIntensity" are all spectra variables that are extracted from the original data file (the mzML file). This is the information what is provided there as the header info for each spectrum and is usually information put there by the MS manufacturers software. Even if you convert from the raw data files to an mzML (e.g. using proteowizard) this data does usually does not get changed/modified. Also, if the data/file was in any form processed (e.g. centroiding, filtering etc) you will start seeing differences here. So, the spectra variables always represent the data what is provided by the original data files. Same for the m/z and intensity values, unless you did any data processing/filtering within Spectra/xcms.

Now, some maybe useful lines of code, if you want to extract peaks data and work with a single data.frame:

pd <- peaksData(s)
s_index <- rep(seq_along(s), vapply(pd, nrow, integer(1)))
pd_df <- data.frame(spectrum_index = s_index, do.call(rbind, pd))

that way you'll have one (veeery long) data.frame with the m/z and intensity values and one additional column that allows you to know from which spectrum the individual peaks are.