Closed parasitetwin closed 1 year ago
Hi Anton, actually, using copy = TRUE
(like you did) all the metadata should be copied over from the original data files. Which line is added to the mzML and has NaN in it? Maybe it would be possible to manually set the value in the MS
data object prior exporting... in addition, it would be good to know which versions of R/MSnbase you are actually using. MSnbase
uses mzR
(and hence proteowizard) for mzML I/O, so maybe a newer version might work?
For package versions, it would be helpful if you could provide the output from sessionInfo()
here.
Hello Johannes! Thanks for the quick reply :)
Could be possible to change the header manually but I didn't manage to use the mzR writeMSData while MSnbase was loaded, even with a mzR::writeMSData()... not sure why actually. Package is also dependent on MSnbase so not possible to only load mzR unfortunately :/ but perhaps you have some idea of how that could be done? (Got an error with MSnbase trying to use the header argument since it's only part of the mzR function).
Apparently the code we used to read the file is:
MS <- readMSData(file, msLevel = 1, verbose=FALSE)
and later
writeMSData(MS, file, copy = TRUE)
Saw in example code on mzR/MSnbase tutorial sites that header()
was used on an object created from openMSfile()
.
Could this be connected?
header(readMSData)
and header(openMSData)
seem to generate distinctly different tables, with the first one having the column names:
fileIdx retention.time precursor.mz precursor.intensity charge peaks.count tic ionCount ms.level acquisition.number collision.energy
and the latter
seqNum acquisitionNum msLevel polarity peaksCount totIonCurrent retentionTime basePeakMZ basePeakIntensity collisionEnergy ionisationEnergy lowMZ highMZ precursorScanNum precursorMZ precursorCharge precursorIntensity mergedScan mergedResultScanNum mergedResultStartScanNum mergedResultEndScanNum injectionTime filterString spectrumId centroided ionMobilityDriftTime isolationWindowTargetMZ isolationWindowLowerOffset isolationWindowUpperOffset scanWindowLowerLimit scanWindowUpperLimit
Could this be connected to the issue?
Sorry for not posting the line immediately... was pretty tired when I wrote yesterday hehe.
This is the line which is present in the copied files which was not present in the original:
<cvParam cvRef="MS" accession="MS:1002476" name="ion mobility drift time" value="nan" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
Picture of the old file:
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2 StatTools_0.0.916 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0
[9] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.1.8 ggplot2_3.4.2 tidyverse_2.0.0 xcms_3.16.1 BiocParallel_1.28.3
[17] MSnbase_2.20.4 ProtGenerics_1.26.0 S4Vectors_0.32.4 mzR_2.28.0 Rcpp_1.0.10 Biobase_2.54.0 BiocGenerics_0.40.0 mzRecalibrate_0.1.00
loaded via a namespace (and not attached):
[1] MatrixGenerics_1.6.0 vsn_3.62.0 BiocManager_1.30.21 affy_1.72.0 GenomeInfoDbData_1.2.7 robustbase_0.95-0
[7] impute_1.68.0 pillar_1.9.0 lattice_0.20-44 glue_1.6.2 limma_3.50.3 digest_0.6.31
[13] GenomicRanges_1.46.1 RColorBrewer_1.1-3 XVector_0.34.0 colorspace_2.1-0 preprocessCore_1.56.0 Matrix_1.5-3
[19] plyr_1.8.8 MALDIquant_1.22 XML_3.99-0.13 pkgconfig_2.0.3 zlibbioc_1.40.0 scales_1.2.1
[25] RANN_2.6.1 affyio_1.64.0 tzdb_0.3.0 timechange_0.2.0 generics_0.1.3 IRanges_2.28.0
[31] withr_2.5.0 SummarizedExperiment_1.24.0 cli_3.6.0 MassSpecWavelet_1.60.1 magrittr_2.0.3 ncdf4_1.21
[37] fansi_1.0.4 MASS_7.3-54 graph_1.72.0 MsFeatures_1.2.0 tools_4.1.0 hms_1.1.3
[43] lifecycle_1.0.3 matrixStats_0.63.0 munsell_0.5.0 cluster_2.1.2 DelayedArray_0.20.0 pcaMethods_1.86.0
[49] compiler_4.1.0 GenomeInfoDb_1.30.1 mzID_1.32.0 rlang_1.1.0 grid_4.1.0 RCurl_1.98-1.10
[55] rstudioapi_0.14.0-9000 MsCoreUtils_1.6.2 bitops_1.0-7 gtable_0.3.3 codetools_0.2-18 DBI_1.1.3
[61] R6_2.5.1 utf8_1.2.3 clue_0.3-64 stringi_1.7.12 vctrs_0.5.2 DEoptimR_1.0-14
[67] tidyselect_1.2.0
Can you please check what value for ionMobilityDriftTime
your file has?
unique(fData(MS)$ionMobilityDriftTime)
for my test file that was NA
and hence it did not get exported (i.e., this attribute gets only exported if it is non-NA). To avoid exporting it at all:
fData(MS)$ionMobilityDriftTime <- NA_real_
if you have ion mobility drift time, make sure the value in that column ("ionMobilityDriftTime") is of type real (e.g. convert it with as.numeric
and replace eventually NaN
with NA_real_
.
So I checked fData(MS)
and it only has one column which is a range from 1 to the number of features.
Inputting your first suggestion thus gave NULL
After that I tried your second code-line suggestion, adding the column "ionMobilityDriftTime" (since it wasn't there prior) Having done that I used the following code to write the new file:
writeMSData(MS, file = fileName, copy = TRUE)
Checking the file written (with and without copy) I found that both versions of it still had "nan" for ion mobility drift time and still can't be opened in mzMine.
Here's a link to one of the files I've been using if that might help: https://chalmers-my.sharepoint.com/:u:/g/personal/antonri_chalmers_se/EWNPgCm1SaxKgbpwVLMi65kB2eSbTtba_-eks1lYjNnt0Q?e=vSjXAe
Thanks for taking your time to look into this!
your fData
had only one column because you asked for only one column (with the [1:100, 1]
). To get all columns you need to drop the 1
in your subsetting command (also, I would suggest to just extract the first 10 instead of the first 100 rows):
fData(MS)[1:10, ]
Thanks for the file - the package versions I tried did not create this additional "ion mobility drift time" entry. I tried R version 4.2 with Bioconductor 3.16 (MSnbase
version 2.24.2 and mzR
2.32) as well as the current stable versions 4.3 with Bioconductor 3.17 (MSnbase
version 2.26.0 and mzR
2.34.1). Thus I guess you should be able to fix your problem by installing a more recent R/Bioconductor version (ideally the currently stable versions R 4.3 with Bioconductor 3.17).
My fData only has 1 column because I wanted to show a small subset of the numbers and by not specifying the column they were in a format which wasn't easy to screenshot ^^
But it only has one column when I read it as well.
Tried installing all the latest versions and it seems to work. Sorry for bothering you with this, true noob-mistake. Should have thought of that.
Thanks for all your help!
Hello, I've tried searching for this amongst the issues but haven't found anything. You have to excuse my ignorance since I'm working with someone else's code but a member of our group has recently developed code for recalibrating QTOF data on a scan-to-scan basis. When we write the new .mzML-files with corrected calibration a number of metadata strings are added to the mzML-files which seem to make it impossible to open the files in mzMine 3. One of the lines I have identified is:
This particular line causes an error as "nan" is not convertible to a number, making it impossible to import into mzMine 3. In the original mzML files this line is not present at all so it seems to be added as we write the new file using "writeMSData".
The actual call to "writeMSData" is: writeMSData(MS, file = paste0(dirname(file), "/mzRecal/", basename(file)), copy = TRUE)
Is it possible to make an exact copy of all such metadata from the original file somehow that I'm unaware of? Have not found any information on this in tutorials or other descriptive webpages for the packages but I could have missed something I guess?
Cheers, Anton Ribbenstedt