lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
124 stars 50 forks source link

Can't write new MSnExp object as mzxml #545

Closed avcarr2 closed 3 years ago

avcarr2 commented 3 years ago

Hello,

I have a situation where I need to take spectra from two different files and combine them. To do this I use MSnBase to read the two files into R, reorder the spectra, create a new MSnBase object and fill the new MSnBase object with the re-ordered spectra. I can do this successfully, but I cannot seem to write the new MSnBase object with the WriteMSData function.

Here is a reproducible example of my code and the session info:

library("MSnbase") library("mzR") library("rlang") library("tidyverse")

Read MS Data files

input_ms1 = readMSData(file.choose(), msLevel= 1) input_ms2 = readMSData(file.choose(), msLevel = 2)

Create new MSnExp Object

combined_data_object <- new("MSnExp")

Return all elements from the environment @asssayData for each of the input objects

list_ms1 <- eapply(input_ms1@assayData, function(x) return(x)) list_ms2 <- eapply(input_ms2@assayData, function(x) return(x))

rt_ms1 <- lapply(list_ms1, function(x) x@rt) rt_ms2 <- lapply(list_ms2, function(x) x@rt)

Sort based on retention time

sort_function <- function(x){ rts <- lapply(x, function(x) x@rt) results <- x[names(sort(unlist(rts)))] results }

Concatenate both lists

combined_list <- c(list_ms1, list_ms2)

Apply sorting function to list

sorted_list <- sort_function(combined_list)

Add the new, sorted list to the new MSnExp object

list2env(sorted_list, envir = combined_data_object@assayData)

writeMSData(combined_data_object, file = "test.mzxml")

Session info: R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

sessionInfo() R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] rlang_0.4.10 MSnbase_2.15.7 ProtGenerics_1.22.0 S4Vectors_0.28.1 mzR_2.24.1 Rcpp_1.0.6
[7] Biobase_2.50.0 BiocGenerics_0.36.1 htmlwidgets_1.5.3 pheatmap_1.0.12 gprofiler2_0.2.0 ggpubr_0.4.0
[13] WGCNA_1.70-3 fastcluster_1.1.25 dynamicTreeCut_1.63-1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6
[19] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.1 ggplot2_3.3.3 tidyverse_1.3.1

loaded via a namespace (and not attached): [1] readxl_1.3.1 backports_1.2.1 Hmisc_4.5-0 plyr_1.8.6 lazyeval_0.2.2 splines_4.0.5
[7] BiocParallel_1.24.1 crosstalk_1.1.1 digest_0.6.27 foreach_1.5.1 htmltools_0.5.1.1 GO.db_3.12.1
[13] fansi_0.4.2 ggfortify_0.4.11 magrittr_2.0.1 checkmate_2.0.0 memoise_2.0.0 cluster_2.1.1
[19] doParallel_1.0.16 limma_3.46.0 openxlsx_4.2.3 modelr_0.1.8 matrixStats_0.58.0 jpeg_0.1-8.1
[25] colorspace_2.0-0 blob_1.2.1 rvest_1.0.0 haven_2.3.1 xfun_0.23 crayon_1.4.1
[31] RCurl_1.98-1.3 jsonlite_1.7.2 impute_1.64.0 survival_3.2-10 iterators_1.0.13 glue_1.4.2
[37] gtable_0.3.0 zlibbioc_1.36.0 car_3.0-10 abind_1.4-5 scales_1.1.1 vsn_3.58.0
[43] DBI_1.1.1 rstatix_0.7.0 viridisLite_0.4.0 xtable_1.8-4 htmlTable_2.2.1 foreign_0.8-81
[49] bit_4.0.4 preprocessCore_1.52.1 Formula_1.2-4 httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.2
[55] pkgconfig_2.0.3 XML_3.99-0.6 farver_2.1.0 nnet_7.3-15 dbplyr_2.1.1 utf8_1.2.1
[61] tidyselect_1.1.1 labeling_0.4.2 later_1.1.0.1 AnnotationDbi_1.52.0 munsell_0.5.0 cellranger_1.1.0
[67] tools_4.0.5 cachem_1.0.4 cli_2.5.0 generics_0.1.0 RSQLite_2.2.3 broom_0.7.6
[73] fastmap_1.1.0 mzID_1.28.0 yaml_2.2.1 knitr_1.33 bit64_4.0.5 fs_1.5.0
[79] zip_2.1.1 ncdf4_1.17 mime_0.10 xml2_1.3.2 compiler_4.0.5 rstudioapi_0.13
[85] plotly_4.9.3 curl_4.3.1 png_0.1-7 affyio_1.60.0 ggsignif_0.6.1 reprex_2.0.0
[91] stringi_1.5.3 lattice_0.20-41 Matrix_1.3-2 vctrs_0.3.8 pillar_1.6.1 lifecycle_1.0.0
[97] BiocManager_1.30.15 MALDIquant_1.19.3 data.table_1.14.0 bitops_1.0-7 httpuv_1.5.5 affy_1.68.0
[103] pcaMethods_1.82.0 R6_2.5.0 latticeExtra_0.6-29 promises_1.1.1 gridExtra_2.3 rio_0.5.26
[109] IRanges_2.24.1 codetools_0.2-18 MASS_7.3-53.1 assertthat_0.2.1 withr_2.4.2 hms_1.1.0
[115] grid_4.0.5 rpart_4.1-15 carData_3.0-4 shiny_1.6.0 lubridate_1.7.10 base64enc_0.1-3
[121] tinytex_0.31

jorainer commented 3 years ago

Seems you forgot to post the actual error message. Anyway, I think the problem here is that combined_data_object will not be a valid MSnExp object. It only contains the spectra but no other information. Also, I did not quite understand your workflow - do you want to merge the content of the two mzML files into a single mzML file?

Note that this might eventually be easier with the new Spectra package:

library(Spectra)
fls <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)  # two files
sps <- Spectra(fls)
## order the spectra from the two files based on retention times
sps <- sps[order(rtime(sps))]
export(sps, backend = MsBackendMzR(), file = "test.mzML")

The "test.mzML" file contains now all spectra from the two input files, ordered by their retention time.

jorainer commented 3 years ago

To install the Spectra package: BiocManager::install("Spectra"). You can also find the documentation (including vignette) here