sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
40 stars 26 forks source link

writeMSData() drops <scanWindow> tag in mzml #202

Closed nbisliuk closed 4 years ago

nbisliuk commented 4 years ago

Creating mzml file with writeMSData() I found that information between scanWindow tags in mzml is missed. It makes mzml not appropriate for some programs (e.g. Dinosaur). Is there a way to fix this problem?

lgatto commented 4 years ago

Thank you for your report. @jorainer, who's the best person to look at this, is currently out of office. In the mean time, it would be helpful if you could provide a short reproducible example to help.

nbisliuk commented 4 years ago

Thank you for response! I uploaded subset of my data to this repo: https://github.com/NickSign/mzml_creation.git The same problem appears when I use copyWriteMSData() function with indication of original file. The point is that I'm using not all scans from original data but some subset of interest

jorainer commented 4 years ago

@NickSign , could you please provide the output of your sessionInfo? And which tags exactly are missing in the exported mzML file?

I'm not aware that we import a scanWindow header information from mzML files - that's also why they are not exported.

nbisliuk commented 4 years ago

@jorainer, sorry for delayed response

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mzR_2.18.0 Rcpp_1.0.1

loaded via a namespace (and not attached):
[1] compiler_3.6.0      ProtGenerics_1.16.0 parallel_3.6.0      tools_3.6.0         yaml_2.2.0          Biobase_2.44.0      codetools_0.2-16    ncdf4_1.16.1       
[9] BiocGenerics_0.30.0

The information missed from mzml should be like this:

<scanWindow>
    <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="400" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
    <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1800" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
  </scanWindow>

Scan window lower&upper limits are not present in header. I can provide it to header so I want writeMSData() to take those values to mzml.

jorainer commented 4 years ago

The cleanest solution would be if header would report these scan window lower limit and upper limit. It is however not clear to me to which parent node the scanWindow is associated with - is this for a spectrum? Would be nice if you could provide one mzML file with this tag for me to check how we could add this.

nbisliuk commented 4 years ago

@jorainer I uploaded a subset of original mzml file https://github.com/NickSign/mzml_creation.git. The <scanWindow> tag is written for each scan inside <scan ...> tag

jorainer commented 4 years ago

Thanks! I will now first add the columns to the header data.frame and then ensure that they are also exported to mzML.

jorainer commented 4 years ago

I've added the variables in my fork of mzR. Could you please check if that works for you @NickSign ? You can install it with devtools::install_github("jorainer/mzR/").

nbisliuk commented 4 years ago

@jorainer can't install the package. Got the following error

> devtools::install_github("jorainer/mzR")
Downloading GitHub repo jorainer/mzR@master
Skipping 5 packages ahead of CRAN: Biobase, BiocGenerics, ProtGenerics, Rhdf5lib, zlibbioc
√  checking for file 'C:\Users\<temp_location>\RtmpIdzvEr\remotes15986ea26775\jorainer-mzR-52172a4/DESCRIPTION' ... 
-  preparing 'mzR': (13s)
√  checking DESCRIPTION meta-information ... 
-  cleaning src
-  checking for LF line-endings in source and make files and shell scripts (710ms)
-  checking for empty or unneeded directories (7.6s)
-  building 'mzR_2.19.4.tar.gz' (814ms)
   Warning: file 'mzR/cleanup' did not have execute permissions: corrected
   Warning: file 'mzR/configure' did not have execute permissions: corrected

* installing *source* package 'mzR' ...
** using staged installation
** libs
Error: (converted from warning) this package has a non-empty 'configure.win' file,
so building only the main architecture
* removing 'C:/Program Files/R/R-3.6.0/library/mzR'
Error in i.p(...) : 
  (converted from warning) installation of package ‘C:/Users/<temp_location>/RtmpIdzvEr/file159819934032/mzR_2.19.4.tar.gz’ had non-zero exit status
jorainer commented 4 years ago

That's strange. Are you on Windows?

nbisliuk commented 4 years ago

@jorainer yes, on Windows

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] remotes_2.0.4     usethis_1.5.0     yaml_2.2.0        rlang_0.3.4       pkgbuild_1.0.3    glue_1.3.1        withr_2.1.2      
 [8] sessioninfo_1.1.1 devtools_2.0.2    memoise_1.1.0     callr_3.2.0       ps_1.3.0          curl_3.3          Rcpp_1.0.2       
[15] backports_1.1.4   desc_1.2.0        pkgload_1.0.2     fs_1.3.1          digest_0.6.18     processx_3.3.1    rprojroot_1.3-2  
[22] cli_1.1.0         tools_3.6.0       magrittr_1.5      crayon_1.3.4      prettyunits_1.0.2 assertthat_0.2.1  rstudioapi_0.10  
[29] R6_2.4.0          compiler_3.6.0  
jorainer commented 4 years ago

OK, I'll try to build compile it for Windows

jorainer commented 4 years ago

Ah, no, sorry - doesn't work. My virtual machine failed. I'll make a PR instead.

nbisliuk commented 4 years ago

@jorainer Does this error come because of my local settings? Should I try Linux instead?

jorainer commented 4 years ago

I guess there might be some problem with the compilers in Windows. It should work in Linux. If you have the possibility please try it there (note that you would also need a recent R and Bioconductor development version, i.e. use BiocManager::install(version = "3.10") to get all the developmental versions of the packages.

nbisliuk commented 4 years ago

@jorainer okay, I'll test on Linux and notify you if everything works

nbisliuk commented 4 years ago

@jorainer everything works fine! I succesfully proced my data with Dinosaur tool https://github.com/fickludd/dinosaur Thank you very much!