lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
210 stars 39 forks source link

Question: no search results on small mzMLs #78

Closed timosachsenberg closed 1 year ago

timosachsenberg commented 1 year ago

Hi, for CI/testing purposes I ran a search on an unfiltered mzML. For CI/testing purposes, I extracted an identified spectrum, but now I don't get any ids. Is intended and a result of internal FDR filtering? If so, can the filtering be disabled? Best, Timo

{
  "version": "0.13.3",
  "database": {
    "bucket_size": 32768,
    "enzyme": {
      "missed_cleavages": 2,
      "min_len": 5,
      "max_len": 50,
      "cleave_at": "KR",
      "restrict": "P",
      "c_terminal": true
    },
    "fragment_min_mz": 200.0,
    "fragment_max_mz": 2000.0,
    "peptide_min_mass": 500.0,
    "peptide_max_mass": 5000.0,
    "ion_kinds": [
      "b",
      "y"
    ],
    "min_ion_index": 2,
    "static_mods": {
      "C": 57.021465
    },
    "variable_mods": {
      "M": [
        15.994915
      ]
    },
    "max_variable_mods": 2,
    "decoy_tag": "DECOY_",
    "generate_decoys": false,
    "fasta": "iPRG2015_decoy.fasta"
  },
  "quant": {
    "tmt": null,
    "tmt_settings": {
      "level": 3,
      "sn": false
    },
    "lfq": false,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.7,
      "ppm_tolerance": 5.0
    }
  },
  "precursor_tol": {
    "ppm": [
      -6.0,
      6.0
    ]
  },
  "fragment_tol": {
    "ppm": [
      -20.0,
      20.0
    ]
  },
  "isotope_errors": [
    -1,
    3
  ],
  "deisotope": false,
  "chimera": false,
  "wide_window": false,
  "min_peaks": 15,
  "max_peaks": 150,
  "max_fragment_charge": null,
  "min_matched_peaks": 6,
  "report_psms": 1,
  "predict_rt": false,
  "mzml_paths": [
    "SageAdapter_1.mzML"
  ],
  "output_paths": [
    "/home/sachsenb/results.sage.pin",
    "/home/sachsenb/results.sage.tsv",
    "/home/sachsenb/results.json"
  ]
}
lazear commented 1 year ago

Hi Timo, can you share the data with me? Sage will write all IDs regardless of FDR (assuming they have at least 6 matches peaks, based on the above config). try running with the "SAGE_LOG" environment variable set to "trace" (e.g., on unix-based systems: SAGE_LOG=trace ./sage config.json), which will output how many spectra were successfully read from the file

One of the CI tests run on this repo is an mzML with a single MS2 scan: https://github.com/lazear/sage/tree/master/tests

timosachsenberg commented 1 year ago

Thanks, that's awesome. I will check it. Maybe the (older) mzML files I used are non-standard. I will give it a closer look once I find time, but if you are interested you can try it yourself on this slightly bigger file: https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/examples/FRACTIONS/BSA1_F1.mzML https://github.com/OpenMS/OpenMS/blob/develop/share/OpenMS/examples/TOPPAS/data/BSA_Identification/18Protein_SoCe_Tr_detergents_trace.fasta

timosachsenberg commented 1 year ago
[2023-07-19T09:03:07Z TRACE sage_core::database] digesting fasta
[2023-07-19T09:03:07Z TRACE sage_core::database] modifying peptides
[2023-07-19T09:03:07Z TRACE sage_core::database] sorting and deduplicating peptides
[2023-07-19T09:03:07Z TRACE sage_core::database] generating fragments
[2023-07-19T09:03:08Z TRACE sage_core::database] finalizing index
[2023-07-19T09:03:08Z INFO  sage] generated 29679385 fragments, 1205863 peptides in 1305ms
[2023-07-19T09:03:08Z INFO  sage] processing files 0 .. 1
[2023-07-19T09:03:08Z ERROR sage] error while processing share/OpenMS/examples/FRACTIONS/BSA1_F1.mzML: MzMLError: malformed cvParam
[2023-07-19T09:03:08Z INFO  sage]  - file IO:       17 ms
[2023-07-19T09:03:08Z INFO  sage]  - search:         0 ms (0 spectra)
[2023-07-19T09:03:08Z TRACE sage_core::ml::linear_discriminant] fitting linear discriminant model...
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z DEBUG sage_core::ml::gauss] Finding solution to linear system failed: left side of matrix [0,0] = NaN
[2023-07-19T09:03:08Z WARN  sage] linear model fitting failed, falling back to heuristic discriminant score
[2023-07-19T09:03:08Z INFO  sage] discovered 0 target peptide-spectrum matches at 1% FDR
[2023-07-19T09:03:08Z INFO  sage] discovered 0 target peptides at 1% FDR
[2023-07-19T09:03:08Z INFO  sage] discovered 0 target proteins at 1% FDR
[2023-07-19T09:03:08Z TRACE sage] writing outputs
lazear commented 1 year ago

Thanks for sharing the files!

Looks like the "malformed cvParam" in question can be found in the precursor/activation chain. Sage's parser attempts to read both an accession and a value from any cvParam nested in a precursor item. Checking the spec, it appears that value is actually optional, but examples in the document (and all recent mzML's I have seen) have empty values (e.g., "") rather than omitting the tag.

"Malformed" (but actually compliant) item in question:

<cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" />

Example from the test mzML in this repo:

<cvParam cvRef="MS" accession="MS:1000422" name="beam-type collision-induced dissociation" value=""/>

I will fix the parser and do some tests, then drop a new release.

lazear commented 1 year ago

OK, just put a new release, should fix this issue!