pymzml / pymzML

pymzML - an interface between Python and mzML Mass spectrometry Files
https://pymzml.readthedocs.io/en/latest/
MIT License
163 stars 92 forks source link

cElementTree.ParseError: no element found #96

Closed vidyavenkat4 closed 5 years ago

vidyavenkat4 commented 6 years ago

I have been using the pymzML to parse mzML files.. I converted my Thermo raw file from Elite instrument using proteowizard with 64-bit float binary encoding and I have encountered the following error which does not happen when I rerun the script.

Traceback (most recent call last): File "/common/venkatramanv/Data/Ruining/4_TIC_parser/tic_parser_one_file.py", line 409, in main(sys.argv[1:]) File "/common/venkatramanv/Data/Ruining/4_TIC_parser/tic_parser_one_file.py", line 23, in main MS1TIC_list, contentDict1, contentDict2, runtime_str = msxtic_parser(filename) File "/common/venkatramanv/Data/Ruining/4_TIC_parser/tic_parser_one_file.py", line 340, in msxtic_parser for spectrum in msrun: File "/hpc/apps/python27/externals/msproteomicstools/0.3.3/lib/python2.7/site-packages/pymzml-0.7.5-py2.7.egg/pymzml/run.py", line 384, in next event, element = next(self.iter, ('END', 'END')) File "", line 107, in next cElementTree.ParseError: no element found: line 5208796, column 28656

My input mzML has the opening and closing tag. But I am unable to reproduce this error when I rerun the script so it seems to happen at random based on some mismatch of data type or tag.

I can also see that the run.py line raises an exception and says to check cElementTree and convert to 32bit-float might help but simply rerunning the script worked for me but I would like to understand when this exception is raised and what is wrong with my input around that line.

Please let me know if you need the complete mzML file as well. Any help will be highly appreciated.

Thanks Vidya

MKoesters commented 6 years ago

Hi Vidya,

Thanks for reporting this, this! In order to fix this I guess I need to take a look at the mzML file, at least the spectrum element which is failing but best would be the whole file.

Best, Manuel

vidyavenkat4 commented 6 years ago

Please find mzML file uploaded to Box link below: https://cedars.box.com/s/gln38avca2l3tskofon3e47byn7si6sv

MKoesters commented 6 years ago

I followed the link and it told me the file was either removed or is not available for me.

vidyavenkat4 commented 6 years ago

Can you try this link?: https://cedars.box.com/shared/static/ivqyvpkx852ykmdvgl7kj06zvsfkf0tr.mzml

fu commented 6 years ago

link works now, 1.5GB downloaded and still on it. I'll have a look asap.

fu commented 6 years ago

Hi @vidyavenkat4

ok, so I checked the file ("just 4.9GB" :)) and the example script worked.

python simple_parser.py QF_180830_SWATH_6600_Plasma_2ug_03_profile.mzML                                                                                            
Parsed 130996 spectra from file QF_180830_SWATH_6600_Plasma_2ug_03_profile.mzML

However, I guess the error comes when you want to extract something from a given spec. What are you extracting? Additionally, I saw that your spec ids are cycling between 1 and 101 in that file, is that supposed to be like that ?

Cheers

.c