sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
40 stars 26 forks source link

Error reading mzML file: "Failed to resolve reference" #281

Closed jorainer closed 1 year ago

jorainer commented 1 year ago

I stumbled across a problem to read mzML files from massive:

library(curl)
url <- "ftp://massive.ucsd.edu/MSV000087155/ccms_peak/New_mzMLFinal/20160603151123624-1576262 Batch5_SHP77_2a.mzML"
fl <- paste0(tempdir(), "/test.mzML")
curl_download(sub(" ", "%20", url, fixed = TRUE), fl)

Now, mzR has an issue reading this file:

library(mzR)
o <- openMSfile(fl)
Error: Can not open file /tmp/RtmpC3pdCi/test.mzML! Original error was: Error: [References::resolve()] Failed to resolve reference.
  object type: N4pwiz6msdata23InstrumentConfigurationE
  reference id: IC1
  referent list: 0

The issue is that the "defaultInstrumentConfigurationRef" is not referenced/available (line 39 below):

> readLines(fl, n = 40)
 [1] "<?xml version=\"1.0\" encoding=\"utf-8\"?>"                                                                     
 [2] "<indexedmzML xmlns=\"https://www.psidev.info/mzML\""                                                            
 [3] "xmlns:xsi=\"https://www.w3.org/2001/XMLSchema-instance\""                                                       
 [4] "xsi:schemaLocation=\"http://www.psidev.info/mzML http://psidev.info/files/ms/mzML/xsd/mzML1.1.2_idx.xsd\">"     
 [5] "<mzML xmlns=\"http://www.psidev.info/mzML\""                                                                    
 [6] "xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\""                                                        
 [7] "id=\"20160603151123624-1576262 Batch5_SHP77_2\""                                                                
 [8] "version=\"1.1.0\""                                                                                              
 [9] "xsi:schemaLocation=\"http://www.psidev.info/mzML http://psidev.info/files/ms/mzML/xsd/mzML1.1.0.xsd\">"         
[10] "<cvList count=\"1\">"                                                                                           
[11] "<cv URI=\"http://psidev.cvs.sourceforge.net/*checkout*/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo\""
[12] "fullName=\"Proteomics Standards Initiative Mass Spectrometry Ontology\""                                        
[13] "id=\"MS\""                                                                                                      
[14] "version=\"3.79.0\"/>"                                                                                           
[15] "</cvList>"                                                                                                      
[16] "<fileDescription>"                                                                                              
[17] "<fileContent>"                                                                                                  
[18] "<cvParam accession=\"MS:1000579\" cvRef=\"MS\" name=\"MS1 spectrum\" value=\"\"/>"                              
[19] "<cvParam accession=\"MS:1000128\" cvRef=\"MS\" name=\"profile spectrum\" value=\"\"/>"                          
[20] "</fileContent>"                                                                                                 
[21] "</fileDescription>"                                                                                             
[22] "<referenceableParamGroupList count=\"1\">"                                                                      
[23] "<referenceableParamGroup id=\"CommonInstrumentParams\">"                                                        
[24] "<cvParam accession=\"MS:1000490\" cvRef=\"MS\" name=\"Agilent instrument model\" value=\"\"/>"                  
[25] "<userParam name=\"instrument model\" value=\"QTOF\"/>"                                                          
[26] "</referenceableParamGroup>"                                                                                     
[27] "</referenceableParamGroupList>"                                                                                 
[28] "<softwareList count=\"3\">"                                                                                     
[29] "<software id=\"MassHunter\" version=\"2.2\">"                                                                   
[30] "<cvParam accession=\"MS:1000678\" cvRef=\"MS\" name=\"MassHunter Data Acquisition\" value=\"\"/>"               
[31] "</software>"                                                                                                    
[32] "<software id=\"pwiz\" version=\"3.0.9248\">"                                                                    
[33] "<cvParam accession=\"MS:1000615\" cvRef=\"MS\" name=\"ProteoWizard software\" value=\"\"/>"                     
[34] "</software>"                                                                                                    
[35] "<software id=\"fiaMiner\" version=\"1819\">"                                                                    
[36] "<cvParam accession=\"MS:1000531\" cvRef=\"MS\" name=\"software\" value=\"\"/>"                                  
[37] "</software>"                                                                                                    
[38] "</softwareList>"                                                                                                
[39] "<run defaultInstrumentConfigurationRef=\"IC1\" defaultSourceFileRef=\"MSScan.bin\""                             
[40] "id=\"20160603151123624-1576262 Batch5_SHP77_2\">"              

What would be the best solution to handle these things? Add a parameter that disables checking for references?

jorainer commented 1 year ago

had a look into the proteowizard code and seems there is no option to disable checks during file reading.