lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

Is it possible to split the chromatograms from multiple files #578

Closed MuyaoXi9271 closed 6 months ago

MuyaoXi9271 commented 1 year ago

Dear expert,

I tried to plot the extracted chromatograms from multiple files individually. I got the error when I split the chromatograms and I checked the lengths of two inputs are equal, as shown in the image below.

image

Is it doable to split the chromatograms and plot them individually? Many thanks for considering my request.

Bests, Muyao

jorainer commented 1 year ago

A MChromatograms is a two-dimensional data structure, thus split is not the best way to get the data by file.

If it's just for plotting purposes, I would suggest to instead loop over the columns (i.e. files):

chrs <- chromatogram(aa, aggregationFun = "sum")
for (i in seq_len(ncol(chrs))) {
    plot(chrs[, i])
}
MuyaoXi9271 commented 1 year ago

Yes, it is only for plotting purposes. Thanks😉

MuyaoXi9271 commented 1 year ago

Hi Johannes,

It only works for extracting multiple columns when I select columns starting with the first column, as shown below. It is really weird. image

Thanks for taking care of this issue in advance

Bests, Muyao

jorainer commented 1 year ago

This is strange. I can not reproduce this error:

library(MSnbase)
fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
            system.file('cdf/KO/ko16.CDF', package = "faahKO"),
            system.file('cdf/KO/ko18.CDF', package = "faahKO"))
data <- readMSData(fls, mode = "onDisk")
chrs <- chromatogram(data, aggregationFun = "sum")
chrs[, 1:2]
MChromatograms with 1 row and 2 columns
           ko15.CDF       ko16.CDF
     <Chromatogram> <Chromatogram>
[1,]   length: 1278   length: 1278
phenoData with 1 variables
featureData with 1 variables

chrs[, 2:3]
MChromatograms with 1 row and 2 columns
           ko16.CDF       ko18.CDF
     <Chromatogram> <Chromatogram>
[1,]   length: 1278   length: 1278
phenoData with 1 variables
featureData with 1 variables

I've also tried with a phenodata provided:

df <- data.frame(sample_id = 1:3, other_col = c("a", "b", "c"))
data <- readMSData(fls, pdata = as(df, "AnnotatedDataFrame"), mode = "onDisk")
chrs <- chromatogram(data, aggregationFun = "sum")
chrs[, 1:2]
MChromatograms with 1 row and 2 columns
                  1              2
     <Chromatogram> <Chromatogram>
[1,]   length: 1278   length: 1278
phenoData with 2 variables
featureData with 1 variables
chrs[, 2:3]
MChromatograms with 1 row and 2 columns
                  2              3
     <Chromatogram> <Chromatogram>
[1,]   length: 1278   length: 1278
phenoData with 2 variables
featureData with 1 variables

So, I guess this has to do with your specific data set or object? Can you provide some more information like how exactly you imported the data and also the output of your sessionInfo()?

MuyaoXi9271 commented 1 year ago

I import data follow these lines

1. Data import as an "OnDiskMSnExp" object by using the method of "readMSData"

raw <- sample_list_pos %>% { readMSData( pull(., filepath), pdata = new("NAnnotatedDataFrame", .), mode = "onDisk", msLevel. = 1 ) }

2. filter out the data eluted from very beginning and very late part

rt_range <- c(0.5, 13.5)*60 raw_filt <- raw %>% filterRt(rt_range) %>% filterEmptySpectra()

cmp_test_chr_raw <- chromatogram(raw_filt, mz = c(130.1091, 130.2091), rt = c(223.2, 271.2))

cmp_test_chr_raw[, 2:3]

session information is R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 LC_MONETARY=Danish_Denmark.1252 [4] LC_NUMERIC=C LC_TIME=Danish_Denmark.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] magrittr_2.0.3 purrr_0.3.4 dplyr_1.0.9 xcms_3.16.1
[5] BiocParallel_1.28.3 MSnbase_2.20.1 ProtGenerics_1.26.0 S4Vectors_0.32.4
[9] mzR_2.28.0 Rcpp_1.0.8.3 Biobase_2.54.0 BiocGenerics_0.40.0

loaded via a namespace (and not attached): [1] MatrixGenerics_1.6.0 vsn_3.62.0 tidyr_1.2.0
[4] foreach_1.5.2 BiocManager_1.30.18 affy_1.72.0
[7] GenomeInfoDbData_1.2.7 robustbase_0.95-0 impute_1.68.0
[10] pillar_1.7.0 lattice_0.20-44 glue_1.6.2
[13] limma_3.50.3 digest_0.6.29 GenomicRanges_1.46.1
[16] RColorBrewer_1.1-3 XVector_0.34.0 colorspace_2.0-3
[19] preprocessCore_1.56.0 Matrix_1.3-4 plyr_1.8.7
[22] MALDIquant_1.21 XML_3.99-0.10 pkgconfig_2.0.3
[25] zlibbioc_1.40.0 scales_1.2.0 snow_0.4-4
[28] RANN_2.6.1 affyio_1.64.0 tibble_3.1.7
[31] generics_0.1.2 IRanges_2.28.0 ggplot2_3.3.6
[34] ellipsis_0.3.2 SummarizedExperiment_1.24.0 cli_3.3.0
[37] MassSpecWavelet_1.60.1 crayon_1.5.1 ncdf4_1.19
[40] fansi_1.0.3 doParallel_1.0.17 MASS_7.3-54
[43] MsFeatures_1.2.0 tools_4.1.1 lifecycle_1.0.1
[46] matrixStats_0.62.0 stringr_1.4.0 munsell_0.5.0
[49] cluster_2.1.2 DelayedArray_0.20.0 pcaMethods_1.86.0
[52] compiler_4.1.1 GenomeInfoDb_1.30.1 mzID_1.32.0
[55] rlang_1.0.2 grid_4.1.1 RCurl_1.98-1.7
[58] iterators_1.0.14 rstudioapi_0.13 MsCoreUtils_1.6.2
[61] bitops_1.0-7 gtable_0.3.0 codetools_0.2-18
[64] R6_2.5.1 gridExtra_2.3 utf8_1.2.2
[67] clue_0.3-61 stringi_1.7.6 vctrs_0.4.1
[70] png_0.1-7 DEoptimR_1.0-11 tidyselect_1.1.2

MuyaoXi9271 commented 1 year ago

It works when I do like this

MChromatograms(cmp_test_chr_raw[, c(2,3), drop = TRUE]) %>% t()

jorainer commented 1 year ago

Can you maybe drop the pdata = new("NAnnotatedDataFrame", .), line in readMSData and try again?

MuyaoXi9271 commented 1 year ago

Hi Johannes,

As you suggested, it works. But only the filenames are included when dropping pdata = new("NAnnotatedDataFrame", .)

Bests, Muyao

MuyaoXi9271 commented 1 year ago

Hi Johannes,

I found the problem is from the phenodata. The extracted phenodata always starts from the first file, as shown below

image

Bests, Muyao