lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

Unable to import MRM mzML files #592

Closed yeesherman closed 1 year ago

yeesherman commented 1 year ago

Hi,

I recently started attempting to use R to analyze some MRM data acquired from a QQQ instrument.

The data is generated from a PerkinElmer QSight 420, exported into mzML format. On attempting to import the data, i get the error below:

readSRMData("PCC_Test.mzML") Error in readSRMData("PCC_Test.mzML") : file(s) '~mzml test\PCC_Test.mzML' do not contain SRM chromatogram data

R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252 LC_MONETARY=English_Singapore.1252 [4] LC_NUMERIC=C LC_TIME=English_Singapore.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] MSnbase_2.24.2 ProtGenerics_1.30.0 S4Vectors_0.36.2 mzR_2.32.0 Rcpp_1.0.10
[6] Biobase_2.58.0 BiocGenerics_0.44.0

loaded via a namespace (and not attached): [1] plyr_1.8.8 compiler_4.2.1 pillar_1.9.0 BiocManager_1.30.20 iterators_1.0.14
[6] zlibbioc_1.44.0 tools_4.2.1 digest_0.6.31 MALDIquant_1.22.1 ncdf4_1.21
[11] preprocessCore_1.60.2 lifecycle_1.0.3 tibble_3.2.1 gtable_0.3.3 lattice_0.21-8
[16] clue_0.3-64 pkgconfig_2.0.3 rlang_1.1.1 foreach_1.5.2 DBI_1.1.3
[21] cli_3.6.1 rstudioapi_0.14 parallel_4.2.1 dplyr_1.1.2 cluster_2.1.4
[26] IRanges_2.32.0 generics_0.1.3 vctrs_0.6.2 MsCoreUtils_1.10.0 tidyselect_1.2.0
[31] grid_4.2.1 glue_1.6.2 impute_1.72.3 R6_2.5.1 fansi_1.0.4
[36] XML_3.99-0.14 BiocParallel_1.32.6 limma_3.54.2 ggplot2_3.4.2 magrittr_2.0.3
[41] pcaMethods_1.90.0 scales_1.2.1 codetools_0.2-19 MASS_7.3-60 mzID_1.36.0
[46] colorspace_2.1-0 utf8_1.2.3 affy_1.76.0 doParallel_1.0.17 munsell_0.5.0
[51] vsn_3.66.0 affyio_1.68.0

lgatto commented 1 year ago

I have no experience with PE and how they export to mzML. I would suggest to open the mzML file and manually verify if it contains chromatographic data.

yeesherman commented 1 year ago

Hi Laurent,

Thanks for the quick reply. What is the best way to open the mzML file? I am not entirely familiar with the file format.

lgatto commented 1 year ago

It's an XML file, so a XML viewer/editor or text editor will do. You should be looking for a Chromatogram tag (or something along these lines) in the file.

yeesherman commented 1 year ago

This is what I see in the file:

MzMzMzPzcEAzMzMzM/NwQDMzMzMzI3FAMzMzMzMjcUA= AAAAAACAdkAAAAAAAABUQAAAAAAAAAAAAAAAAAAAAAA=

It appears that the data has been extracted, but it might not be in the correct format?

lgatto commented 1 year ago

Indeed, it seems to be there. You could try with the latest version of mzR (version 2.34.0), as I believe there has been an update in the pwiz code that is shipped. @jorainer , any experience with these data?

jorainer commented 1 year ago

from the content above it seems to me the data is stored as spectrum and not chromatogram entries (i.e. you have 11385 entries in the spectrumList and the data for each entry is a m/z array and an intensity array. This type of data can be read with the readMSData function instead. If you want to use the readSRMData you need to store (convert?) the data as a mzML with a chromatogramList instead of a spectrumList.

yeesherman commented 1 year ago

When i tried to use the readMSData function, this is the output I get:

mzML <- readMSData("PCC_Test.mzML") Error in readInMemMSData(files, pdata = pdata, msLevel. = msLevel., verbose = verbose, : No MS(n>1) spectra in file ~\mzml test\PCC_Test.mzML

yeesherman commented 1 year ago

I am able to load the data into R using

sp <- Spectra("PCC_Test.mzML") sp MSn data (Spectra) with 11385 spectra in a MsBackendMzR backend: msLevel rtime scanIndex

1 1 0.000 1 2 1 0.222 2 3 1 0.656 3 4 1 0.869 4 5 1 1.091 5 ... ... ... ... 11381 1 46.286 11381 11382 1 46.720 11382 11383 1 46.933 11383 11384 1 47.155 11384 11385 1 47.589 11385 ... 33 more variables/columns.

To follow up on this, how should I go about trying to analyze and quantify the peak?

lgatto commented 1 year ago
  1. The readMSData() function failed because it looks (by default, based on your arguments) for MS levels > 1. You could parametrise it to return the MS 1 spectra.
  2. I would suggest to use Spectra(), which is the new and improved infrastructure of MS data.

I'm closing this issue, given that the original issue has been solved.