Closed chasemc closed 3 years ago
Oh this is super interesting, I've honestly never opened an mzML from real MALDI data. Let me take a look and worst case, we might have to pull in a different parser for this kind of data.
OK yeah I think pymzml can't handle non-numerical data. I think w'ell have to switch to pyteomics instead of pymzml. However, this does work when converting to mzXML. Could you give that a go?
I gave it a try with this command:
python ./msql_cmd.py
test/Protein_Data.mzXML
"QUERY scaninfo(MS1DATA)"
and got this
scan rt mslevel i query_index
0 1 0.0 1 367336682.0 0
1 2 0.0 1 344296471.0 0
2 3 0.0 1 395128629.0 0
3 4 0.0 1 318670421.0 0
4 5 0.0 1 513163051.0 0
5 6 0.0 1 448719899.0 0
6 7 0.0 1 317714726.0 0
7 8 0.0 1 622034997.0 0
Yeah, with mzxml I got the same as you
Awesome, yeah play around with some queries in mzXML then, hopefully it finds the right things that you're looking for!
I haven't dug into it but it looks like the mzml parser is having issues with MALDI data when it contains data from more than one spot, a file with a single spectrum seems to have worked but I haven't tested more yet.
To reproduce:
I took the raw data in https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=7ce7c09a174545a4a7dfe80af25329b0 and converted it using the default settings in msconvert (fresh install) (Protein_Data.zip)
Environment Setup
Test (all relative to 'MassQueryLanguage' directory)
Error: