lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

Error in .mzRBackendFromContent(x) : Could not determine file type #598

Closed MCK-sketch closed 7 months ago

MCK-sketch commented 7 months ago

Hi, I am getting having the following problem.

I run this:

setwd("C:/Scratch/Metabolomics/trial") fls <- dir(path = ".", full.names = TRUE) pd <- data.frame(sample_name = sub(basename(fls), pattern = ".mzdata.xml", replacement = "", fixed = TRUE), stringsAsFactors = FALSE) raw_data <- readMSData(files = fls, pdata = new("NAnnotatedDataFrame", pd), mode = "onDisk")

And then get this error Error in .mzRBackendFromContent(x) : Could not determine file type for C:\Scratch\Metabolomics\trial\s1.mzdata.xml

Can you let me know how I can fix this?

lgatto commented 7 months ago
MCK-sketch commented 7 months ago

I converted Agilent .D format to Mzdata format using the export function in Agilent Masshunter, the file name + file format is "s1.mzdata.xml"

R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

locale: [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Australia.utf8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] tidyr_1.3.0 stringr_1.5.1 readxl_1.4.3 magrittr_2.0.3
[5] xcms_3.18.0 BiocParallel_1.30.3 MSnbase_2.22.0 ProtGenerics_1.28.0 [9] S4Vectors_0.34.0 mzR_2.30.0 Rcpp_1.0.11 Biobase_2.56.0
[13] BiocGenerics_0.42.0

lgatto commented 7 months ago
MCK-sketch commented 7 months ago

Thanks, it seems to be working now that I used MSConvert to convert the files from .D to mzML, but I don't know if I am using the right settings in the conversion - any tips would be appreciated, thanks.

ckeeling commented 4 months ago

Hello @lgatto ,

I am having the same problem with mzdata exported from Agilent's MassHunter D format.

raw_data <- readMSData(files=datafiles,pdata=new("NAnnotatedDataFrame",pd,mode="onDisk"))
Error in .mzRBackendFromContent(x) : 
  Could not determine file type for /Users/.../XCMS/mzData_Exports/sample_1.mzdata.xml
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.7.1
...
attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] IPO_1.28.0                  CAMERA_1.58.0               rsm_2.10.4                  factoextra_1.0.7           
 [5] ggplot2_3.4.4               cluster_2.1.6               gplots_3.1.3.1              magrittr_2.0.3             
 [9] pander_0.6.5                RColorBrewer_1.1-3          xcms_4.0.2                  MSnbase_2.27.1             
[13] ProtGenerics_1.34.0         mzR_2.36.0                  Rcpp_1.0.12                 BiocParallel_1.36.0        
[17] QFeatures_1.12.0            MultiAssayExperiment_1.28.0 SummarizedExperiment_1.32.0 Biobase_2.62.0             
[21] GenomicRanges_1.54.1        GenomeInfoDb_1.38.5         IRanges_2.36.0              S4Vectors_0.40.2           
[25] BiocGenerics_0.48.1         MatrixGenerics_1.14.0       matrixStats_1.2.0           MsFeatures_1.10.0   

The first few lines of the sample_1.mzdata.xml files are:

<?xml version="1.0" encoding="utf-8"?>
<mzData version="1.05" accessionNumber="psi-ms:100" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <cvLookup cvLabel="psi" fullName="The PSI Ontology" version="1.00" address="http://psidev.sourceforge.net/ontology" />
  <description>
    <admin>
...

I'm on MacOS so using MSConvert to convert .D to . mzML isn't easily possible. Any suggestions on why the mzdata format is problematic? Can it be fixed? I note that on xcms-online, these mzdata.xml load fine, but I would prefer to work in R and the more up-to-date packages for xcms etc.

Thanks! Chris

lgatto commented 4 months ago

Could you try the following:

mzR::openMSfile(datafiles[1], backend = "pwiz")

note that only one file should be passe there, hence datafiles[1].

ckeeling commented 4 months ago

Thanks for looking into this @lgatto. I get the following:

mzR::openMSfile(datafiles[1], backend = "pwiz")
Error: Can not open file mzData_Exports/sample_1.mzdata.xml! Original error was: Error: [MSDataFile::readFile()] Unsupported file format.
lgatto commented 4 months ago

@sneumann @jorainer - we didn't remove mzData support, did we? As far as I remember (and read in the man pages), mzData is supposed to be handled by the pwiz backend. Any idea?

ckeeling commented 4 months ago

It also seem that https://github.com/sneumann/mzR/blob/devel/R/io.R does't catch mzdata.xml files with/without specifying the backend.

mzR::openMSfile(datafiles[1])
Error in .mzRBackendFromContent(x) : 
  Could not determine file type for mzData_Exports/sample_1.mzdata.xml

For reference:

pwiz.version()
[1] "3.0.21263"

Agilent MassHunter exports to only "ASR", "MGF", and "MzData" formats.

lgatto commented 4 months ago

Yes, that's the same error you had initially, and why I wanted to test with an explicit backend.

sneumann commented 4 months ago

Hi, mzData was removed from mzR a while ago, any mentions of mzData are leftovers, and should be removed. Can I have a link where there is mzData mentioned ? Yours, Steffen

lgatto commented 4 months ago
ckeeling commented 4 months ago

OK, thanks for solving this for me. Not the best news that mzdata is no longer supported, but prevents further troubleshooting.

lgatto commented 4 months ago

Sorry about that, and for not fixing this in the documentation.

ckeeling commented 4 months ago

mzdata is also mentioned in the subtitle of the Bioconductor page: https://bioconductor.org/packages/release/bioc/html/mzR.html