Closed timosachsenberg closed 1 year ago
Hi Timo, can you share the data with me? Sage will write all IDs regardless of FDR (assuming they have at least 6 matches peaks, based on the above config). try running with the "SAGE_LOG" environment variable set to "trace" (e.g., on unix-based systems: SAGE_LOG=trace ./sage config.json
), which will output how many spectra were successfully read from the file
One of the CI tests run on this repo is an mzML with a single MS2 scan: https://github.com/lazear/sage/tree/master/tests
Thanks, that's awesome. I will check it. Maybe the (older) mzML files I used are non-standard. I will give it a closer look once I find time, but if you are interested you can try it yourself on this slightly bigger file: https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/examples/FRACTIONS/BSA1_F1.mzML https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/examples/TOPPAS/data/BSA_Identification/18Protein_SoCe_Tr_detergents_trace.fasta
[2023-07-19T09:03:07Z TRACE sage_core::database] digesting fasta
[2023-07-19T09:03:07Z TRACE sage_core::database] modifying peptides
[2023-07-19T09:03:07Z TRACE sage_core::database] sorting and deduplicating peptides
[2023-07-19T09:03:07Z TRACE sage_core::database] generating fragments
[2023-07-19T09:03:08Z TRACE sage_core::database] finalizing index
[2023-07-19T09:03:08Z INFO sage] generated 29679385 fragments, 1205863 peptides in 1305ms
[2023-07-19T09:03:08Z INFO sage] processing files 0 .. 1
[2023-07-19T09:03:08Z ERROR sage] error while processing share/OpenMS/examples/FRACTIONS/BSA1_F1.mzML: MzMLError: malformed cvParam
[2023-07-19T09:03:08Z INFO sage] - file IO: 17 ms
[2023-07-19T09:03:08Z INFO sage] - search: 0 ms (0 spectra)
[2023-07-19T09:03:08Z TRACE sage_core::ml::linear_discriminant] fitting linear discriminant model...
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z WARN sage] linear model fitting failed, falling back to heuristic discriminant score
[2023-07-19T09:03:08Z INFO sage] discovered 0 target peptide-spectrum matches at 1% FDR
[2023-07-19T09:03:08Z INFO sage] discovered 0 target peptides at 1% FDR
[2023-07-19T09:03:08Z INFO sage] discovered 0 target proteins at 1% FDR
[2023-07-19T09:03:08Z TRACE sage] writing outputs
Thanks for sharing the files!
Looks like the "malformed cvParam" in question can be found in the precursor/activation chain. Sage's parser attempts to read both an accession
and a value
from any cvParam
nested in a precursor item. Checking the spec, it appears that value
is actually optional, but examples in the document (and all recent mzML's I have seen) have empty values (e.g., "") rather than omitting the tag.
"Malformed" (but actually compliant) item in question:
<cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" />
Example from the test mzML in this repo:
<cvParam cvRef="MS" accession="MS:1000422" name="beam-type collision-induced dissociation" value=""/>
I will fix the parser and do some tests, then drop a new release.
OK, just put a new release, should fix this issue!
Hi, for CI/testing purposes I ran a search on an unfiltered mzML. For CI/testing purposes, I extracted an identified spectrum, but now I don't get any ids. Is intended and a result of internal FDR filtering? If so, can the filtering be disabled? Best, Timo