pymzml / pymzML

pymzML - an interface between Python and mzML Mass spectrometry Files
https://pymzml.readthedocs.io/en/latest/
MIT License
158 stars 91 forks source link

Not index found and raise Exception( Exception: Spectrum ID should be between 1 and 1) #357

Open mremachine1 opened 2 months ago

mremachine1 commented 2 months ago

Describe the bug The mzML file has spectrums and looks like it is properly formatted, but when I try to open it with pymzml.reader it saids the index isnt found. Ive tried build index from scratch = True, but that did not rectify the problem. Both the reader object and the file itself indicate that the file does in fact have scan ids, but when trying to access the scan data via scan ID, i get the following exception thrown: "Spectrum ID should be between 1 and 1".

To Reproduce Steps to reproduce the behavior:

import pymzml new_path = "X:\JS\Adductomics\BariatricStudy\DIAumpire\" new_fh = new_path + "JS-CS_LI_221120_GroopmanJ_JS_PAA_3_P3_C_correct_Q2.mzML" test_mzml = pymzml.run.Reader(new_fh) [Warning] Not index found and build_index_from_scratch is False test_mzml = pymzml.run.Reader(new_fh, build_index_from_scratch=True) data = test_mzml[1944]

Expected behavior with other files from the same study we are able to access the corresponding scan via reader_object[Spectrum ID]

Desktop (please complete the following information): windows 11 pro

Additional context i would like to upload the problematic mzml file but it wont let me upload it due to size limits. error_pymzml.txt

Schulze-lab commented 2 months ago

Thanks for reporting the issue! @MKoesters might be able to help more with the specifics on the indexing, but it indeed should work for all files the same way.

As a potential workaround: did you try using the example script for creating indexed gzip files? https://github.com/pymzml/pymzML/blob/dev/example_scripts/gzip_mzml.py

mremachine1 commented 2 months ago

Thanks for reporting the issue! @MKoesters might be able to help more with the specifics on the indexing, but it indeed should work for all files the same way.

As a potential workaround: did you try using the example script for creating indexed gzip files? https://github.com/pymzml/pymzML/blob/dev/example_scripts/gzip_mzml.py instead of trying to use the reader index accession, I just enumerated through my reader object and just chose spec.ID == scannum to access the data. Not as efficient but it works

MKoesters commented 2 months ago

Hi,

I'd need the mzML or at least an excerpt of it. Could you copy some of these elements for me to have a look at:

<spectrum index="5" id="controllerType=0 controllerNumber=1 scan=6" defaultArrayLength="1059">

Maybe something off with the id

Out of curiosity, do you get the same error when you set build_index_from_scratch to False?