Closed jblakele closed 4 years ago
Hi Alfredo,
Thank you very much for reporting this and your good description of the problem.
We saw that problem before, but we were not getting a response if the issue still exists, good you bring it up.
Could you try if settting build_index_from_scratch
to True
when initalizing the reader can circumvent your problem for the time the issue exists?
I'll check that issue on monday using your file, you should have gotten a request from the to download the file.
Best, Manuel
Hi Manuel,
I just tested your recommendation and it fixed it permanently. Now when I initialize even if I don't include the code build_index_from_scratch = True the spectra are visible. Does this suggest an issue with msconvert?
This was an intermittent issue. I'll let you know if this was a permanent fix for all cases.
Best Regards, Alfredo
Well, that sound really strange, but I dont think that this could be an issue with msconvert. As said, I'll have a look at the mzML on monday and test if I can reproduce the error on my machine. Anyways, I'll mark this issue as a wiki entry since this seems to happen for at least 2 users.
Best, Manuel
Ok I've found a new instance of the bug but in a different file and it is resistant to re-indexing. Here is the link to the file.
https://drive.google.com/open?id=1Y8YCUwNG5DpXPCFfmMxxloctksMRGntA
code pymzml.run.Reader("20180409_AB_CA01_Run2_12ug_MudPIT_QE1_05.mzML",build_index_from_scratch=True)[17601]["MS:1000016"] returns retention time 60.265408
pymzml.run.Reader("20180409_AB_CA01_Run2_12ug_MudPIT_QE1_05.mzML",build_index_from_scratch=True)[17602]["MS:1000016"] Returns ParseError: syntax error: line 1, column 0
I'm currently downloading and will look at both Is the second mzML also indexed?
Hi Alfredo,
could you check if the Pull request #148 solves your issues?
Best, Manuel
Hi Manuel,
I am still seeing the error, but since this is the first time I've checked a pull request let me just make sure I installed it correctly.
Install:
Code test:
Best Regards, Alfredo
Hi Alfredo,
Could you try: pip install git+https://github.com/pymzml/pymzml@refs/pull/148/merge
What you are doing is installing the master branch of my fork, however the pull request is coming from the fix/#145 branch. The above command installs the actual pull request in this repo.
Best, Manuel
Hi Manuel,
Your update fixed that specific instance.
However, upon further testing I identified several more instances where a similar bug is continuing to occur. I've included a code example, a link to the mzML files, and specific affected spectra.
Link to mzML files
https://drive.google.com/open?id=15pJ3rpXzMcUf9uVB0w8TsL2GrxPauL38
Code Example:
Affected Spectra
Best Regards, Alfredo
Hi Alfredo,
I'll have a closer look now and report back as soon as I found the issue.
Best, Manuel
Hi Alfredo,
I checked all your files and I could retrieve every spectrum by scan_id. I'm realizing now that you are trying to retrieve spectra by their index, which was afaik never implemented like that, but planned since some time. It could be enabled by changing a regular expression to extract index rather than scan_id when building the offset_dict or looking for spectra. However that would be require to specify this when initializing the reader.
If you are interested in such an feature, please tell me and I'll see when I can find the time to implement it.
Best, Manuel
Hi,
just to add to the scan_id vs. index point, though I'm not sure if it helps: It had been implemented to retrieve the index using spectrum.index or through the spectrum.id_dict, so I guess that could be used to build the offset_dict
Best, Stefan
Actually, I prefer scan. When I was initially testing I thought the reader was using index so I was offsetting by one, but now I see that it is using scan. My mistake. I think I've narrowed down the problem a little more. When you iterate through the file, all the spectra are successfully collected, but when you need to collect data from specific spectra some very small percentage of spectra are throwing an error. For now, it might better to collect data from all spectra by iterating through the file.
Best Regards, Alfredo
@MKoesters @jblakele
I am experiencing the same issue.
Example file: https://www.dropbox.com/s/a6jk2pxjcxokssy/batch04_B02_rep01_301.mzML?dl=0
path = os.path.join("tests", "batch04_B02_rep01_301.mzML")
run = pymzml.run.Reader(path)
print(20, run[20])
print(21, run[21])
The offset is somehow incorrectly calculated. For scan id 21
receives a 'spec_string' that contains two scan records instead of one.
Hi @RJMW
I could not reproduce your Error with your file, however I implemented a work around which hopefully avoids running in to your problem. Could you install the pull request and see if it works for you? I'll merge to dev then and could push a hotfix to master if required
Best, Manuel
@MKoesters many thanks for looking into this so quickly! The work around you have implemented seems to work well. A hotfix would be great - thanks.
20 <__main__.Spectrum object with native ID 20 at 0x10e72c8d0>
21 <__main__.Spectrum object with native ID 21 at 0x10e739dd8>
22 <__main__.Spectrum object with native ID 22 at 0x10e743a58>
@RJMW the hotfix has been merged into master and dev now A new release with that fix is already at pypi :)
And on bioconda! :)
@MKoesters I ran into an I/O operation on closed file
error that is somewhat related to the above. I can only access each scan once when I use a BytesIO object. Not sure where in the code you close the BytesIO object. See snippet below. Any idea how we can fix this?
from io import BytesIO
import pymzml
run = pymzml.run.Reader("tests/data/example.mzML")
print(run[3])
print(run[3])
with open("tests/data/example.mzML", "rb") as inp:
in_memory = BytesIO(inp.read())
run = pymzml.run.Reader(in_memory)
print(run[3])
print(run[3])
in_memory.close()
Hi @RJMW ,
I'll look into this, however I did not implement the Bytes interface, so I have to see how quick I'll be able to help you.
Hi @RJMW ,
took me some time, but I hope I fixed it.
The issue was that calling opening a new seeker within a with
statement closed the underlying binary stream.
I removed the with
statement so the file_handler is only closed after calling `pymzml.run.Reader.close()' or when one of the file_objects in the hierarchy above the binary stream is closed.
Check out #182 and tell me if this also works for you.
Best, Manuel
Hi,
For some of my mzML files I'm unable to extract info for specific spectra. Either an error is called ParseError: junk after document element: line 62, column 8 or it just hangs at least for 12 hours on my last test. This works fine for most spectra, but happens reproducibly for specific spectra. I've looked at the mzML file and those spectra are in the index, and I've looked at the spectra themselves and cannot see any discernible difference with the ones that work. I've attached a link to an mzML file. I've observed this behavior in other files as well.
https://drive.google.com/open?id=1ZyEMHqy3ndG7-U4oimt7_Z3DOSqXyBXL
I've tested it on my Ubuntu virtual machine, and on my Ubuntu linux kearnal on my Windows 10 desktop. pymzML version is 2.2.5 installed tested with bioconda and pip install.
code I am using.
import pymzml
run = pymzml.run.Reader("20180409_AB_CA01_Run2_12ug_MudPIT_QE1_04.mzML")
run[34980].scan_time_in_minutes() ParseError: junk after document element: line 62, column 8
run[34979].scan_time_in_minutes() Returns retention time 102.45652
run[34981].scan_time_in_minutes() Hangs forever.
Best Regards, Alfredo