lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

Ion mobility data set to "nan" per scan when using writeMSData() and makes file unreadable in mzMine 3 #594

Closed parasitetwin closed 1 year ago

parasitetwin commented 1 year ago

Hello, I've tried searching for this amongst the issues but haven't found anything. You have to excuse my ignorance since I'm working with someone else's code but a member of our group has recently developed code for recalibrating QTOF data on a scan-to-scan basis. When we write the new .mzML-files with corrected calibration a number of metadata strings are added to the mzML-files which seem to make it impossible to open the files in mzMine 3. One of the lines I have identified is:

This particular line causes an error as "nan" is not convertible to a number, making it impossible to import into mzMine 3. In the original mzML files this line is not present at all so it seems to be added as we write the new file using "writeMSData".

The actual call to "writeMSData" is: writeMSData(MS, file = paste0(dirname(file), "/mzRecal/", basename(file)), copy = TRUE)

Is it possible to make an exact copy of all such metadata from the original file somehow that I'm unaware of? Have not found any information on this in tutorials or other descriptive webpages for the packages but I could have missed something I guess?

Cheers, Anton Ribbenstedt

jorainer commented 1 year ago

Hi Anton, actually, using copy = TRUE (like you did) all the metadata should be copied over from the original data files. Which line is added to the mzML and has NaN in it? Maybe it would be possible to manually set the value in the MS data object prior exporting... in addition, it would be good to know which versions of R/MSnbase you are actually using. MSnbase uses mzR (and hence proteowizard) for mzML I/O, so maybe a newer version might work?

For package versions, it would be helpful if you could provide the output from sessionInfo() here.

parasitetwin commented 1 year ago

Hello Johannes! Thanks for the quick reply :)

Could be possible to change the header manually but I didn't manage to use the mzR writeMSData while MSnbase was loaded, even with a mzR::writeMSData()... not sure why actually. Package is also dependent on MSnbase so not possible to only load mzR unfortunately :/ but perhaps you have some idea of how that could be done? (Got an error with MSnbase trying to use the header argument since it's only part of the mzR function).

Apparently the code we used to read the file is: MS <- readMSData(file, msLevel = 1, verbose=FALSE)

and later

writeMSData(MS, file, copy = TRUE)

Saw in example code on mzR/MSnbase tutorial sites that header() was used on an object created from openMSfile(). Could this be connected?

header(readMSData) and header(openMSData) seem to generate distinctly different tables, with the first one having the column names: fileIdx retention.time precursor.mz precursor.intensity charge peaks.count tic ionCount ms.level acquisition.number collision.energy and the latter seqNum acquisitionNum msLevel polarity peaksCount totIonCurrent retentionTime basePeakMZ basePeakIntensity collisionEnergy ionisationEnergy lowMZ highMZ precursorScanNum precursorMZ precursorCharge precursorIntensity mergedScan mergedResultScanNum mergedResultStartScanNum mergedResultEndScanNum injectionTime filterString spectrumId centroided ionMobilityDriftTime isolationWindowTargetMZ isolationWindowLowerOffset isolationWindowUpperOffset scanWindowLowerLimit scanWindowUpperLimit

Could this be connected to the issue?

Sorry for not posting the line immediately... was pretty tired when I wrote yesterday hehe. This is the line which is present in the copied files which was not present in the original: <cvParam cvRef="MS" accession="MS:1002476" name="ion mobility drift time" value="nan" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/> image

Picture of the old file: image

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.17    iterators_1.0.14     foreach_1.5.2        StatTools_0.0.916    lubridate_1.9.2      forcats_1.0.0        stringr_1.5.0        dplyr_1.1.0         
 [9] purrr_1.0.1          readr_2.1.4          tidyr_1.3.0          tibble_3.1.8         ggplot2_3.4.2        tidyverse_2.0.0      xcms_3.16.1          BiocParallel_1.28.3 
[17] MSnbase_2.20.4       ProtGenerics_1.26.0  S4Vectors_0.32.4     mzR_2.28.0           Rcpp_1.0.10          Biobase_2.54.0       BiocGenerics_0.40.0  mzRecalibrate_0.1.00

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.6.0        vsn_3.62.0                  BiocManager_1.30.21         affy_1.72.0                 GenomeInfoDbData_1.2.7      robustbase_0.95-0          
 [7] impute_1.68.0               pillar_1.9.0                lattice_0.20-44             glue_1.6.2                  limma_3.50.3                digest_0.6.31              
[13] GenomicRanges_1.46.1        RColorBrewer_1.1-3          XVector_0.34.0              colorspace_2.1-0            preprocessCore_1.56.0       Matrix_1.5-3               
[19] plyr_1.8.8                  MALDIquant_1.22             XML_3.99-0.13               pkgconfig_2.0.3             zlibbioc_1.40.0             scales_1.2.1               
[25] RANN_2.6.1                  affyio_1.64.0               tzdb_0.3.0                  timechange_0.2.0            generics_0.1.3              IRanges_2.28.0             
[31] withr_2.5.0                 SummarizedExperiment_1.24.0 cli_3.6.0                   MassSpecWavelet_1.60.1      magrittr_2.0.3              ncdf4_1.21                 
[37] fansi_1.0.4                 MASS_7.3-54                 graph_1.72.0                MsFeatures_1.2.0            tools_4.1.0                 hms_1.1.3                  
[43] lifecycle_1.0.3             matrixStats_0.63.0          munsell_0.5.0               cluster_2.1.2               DelayedArray_0.20.0         pcaMethods_1.86.0          
[49] compiler_4.1.0              GenomeInfoDb_1.30.1         mzID_1.32.0                 rlang_1.1.0                 grid_4.1.0                  RCurl_1.98-1.10            
[55] rstudioapi_0.14.0-9000      MsCoreUtils_1.6.2           bitops_1.0-7                gtable_0.3.3                codetools_0.2-18            DBI_1.1.3                  
[61] R6_2.5.1                    utf8_1.2.3                  clue_0.3-64                 stringi_1.7.12              vctrs_0.5.2                 DEoptimR_1.0-14            
[67] tidyselect_1.2.0
jorainer commented 1 year ago

Can you please check what value for ionMobilityDriftTime your file has?

unique(fData(MS)$ionMobilityDriftTime)

for my test file that was NA and hence it did not get exported (i.e., this attribute gets only exported if it is non-NA). To avoid exporting it at all:

fData(MS)$ionMobilityDriftTime <- NA_real_

if you have ion mobility drift time, make sure the value in that column ("ionMobilityDriftTime") is of type real (e.g. convert it with as.numeric and replace eventually NaN with NA_real_.

parasitetwin commented 1 year ago

So I checked fData(MS) and it only has one column which is a range from 1 to the number of features. image

Inputting your first suggestion thus gave NULL

After that I tried your second code-line suggestion, adding the column "ionMobilityDriftTime" (since it wasn't there prior) Having done that I used the following code to write the new file:

writeMSData(MS, file = fileName, copy = TRUE)

Checking the file written (with and without copy) I found that both versions of it still had "nan" for ion mobility drift time and still can't be opened in mzMine.

Here's a link to one of the files I've been using if that might help: https://chalmers-my.sharepoint.com/:u:/g/personal/antonri_chalmers_se/EWNPgCm1SaxKgbpwVLMi65kB2eSbTtba_-eks1lYjNnt0Q?e=vSjXAe

Thanks for taking your time to look into this!

jorainer commented 1 year ago

your fData had only one column because you asked for only one column (with the [1:100, 1]). To get all columns you need to drop the 1 in your subsetting command (also, I would suggest to just extract the first 10 instead of the first 100 rows):

fData(MS)[1:10, ]
jorainer commented 1 year ago

Thanks for the file - the package versions I tried did not create this additional "ion mobility drift time" entry. I tried R version 4.2 with Bioconductor 3.16 (MSnbase version 2.24.2 and mzR 2.32) as well as the current stable versions 4.3 with Bioconductor 3.17 (MSnbase version 2.26.0 and mzR 2.34.1). Thus I guess you should be able to fix your problem by installing a more recent R/Bioconductor version (ideally the currently stable versions R 4.3 with Bioconductor 3.17).

parasitetwin commented 1 year ago

My fData only has 1 column because I wanted to show a small subset of the numbers and by not specifying the column they were in a format which wasn't easy to screenshot ^^

image But it only has one column when I read it as well.

Tried installing all the latest versions and it seems to work. Sorry for bothering you with this, true noob-mistake. Should have thought of that.

Thanks for all your help!