lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Wide open modification search with TMT level 2 #84

Closed ludgergoeminne closed 10 months ago

ludgergoeminne commented 10 months ago

Dear Michael

I also started using SAGE, and I find it an amazing tool, it is incredible what you managed to create.

I was just wondering how I can do a wide open search for modifications with SAGE? I am re-processing dataset PXD011967 from PRIDE with SAGE, and I managed to generate output. However, it seems like I did not find any modifications at all except for 1 - 2 Da mass shifts. I see this because if I import the results in R, hist(data.PXD011967.sage$expmass-data.PXD011967.sage$calcmass, breaks = 100) and hist(data.PXD011967.sage$expmass*data.PXD011967.sage$precursor_ppm/1e6, breaks = 100) have only peaks around -2, -1, 0, 1, and 2.

To reproduce, it is probably not feasible for you to use this whole big dataset, but the problem can likely already be reproduced by using a few files (ScltlMsclSet_12BRPhsFr5.mzML and ScltlMsclSet_12BRPhsFr4.mzML had some of the most identifications in my set-up). I generated these files from ScltlMsclSet_12BRPhsFr5.raw and ScltlMsclSet_12BRPhsFr4.raw with msconvert as described here.

My config.json file looks like this:

{
  "database": {
    "bucket_size": 32768,
    "enzyme": {
      "missed_cleavages": 3,
      "min_len": 5,
      "max_len": 50,
      "cleave_at": "KR",
      "restrict": null,
      "c_terminal": true
    },
    "fragment_min_mz": 100.0,
    "fragment_max_mz": 2000.0,
    "peptide_min_mass": 500.0,
    "peptide_max_mass": 5000.0,
    "ion_kinds": ["b", "y"],
    "min_ion_index": 2,
    "static_mods": {
      "C": 57.021464
    },
    "variable_mods": {},
    "max_variable_mods": 2,
    "decoy_tag": "rev_",
    "generate_decoys": true,
    "fasta": "/path_to_fasta/homo_sapiens_22-08-2023.fasta"
  },
  "quant": {
    "tmt": "Tmt6",
    "tmt_settings": {
      "level": 2,
      "sn": false
    }
  },
  "precursor_tol": {
    "da": [
      -500,
      100
    ]
  },
  "fragment_tol": {
    "ppm": [
     -10,
     10
    ]
  },
  "isotope_errors": [
    0,
    0
  ],
  "deisotope": false,
  "chimera": true,
  "wide_window": true,
  "predict_rt": false,
  "min_peaks": 15,
  "max_peaks": 150,
  "min_matched_peaks": 4,
  "max_fragment_charge": null,
  "report_psms": 1,
  "output_directory": "/path_to_output/SAGE_output",
  "mzml_paths": [
    "/path_to_mzml/ScltlMsclSet_12BRPhsFr4.mzML", 
    "/path_to_mzml/ScltlMsclSet_12BRPhsFr5.mzML", 
  ]       
}
ludgergoeminne commented 10 months ago

Oh, wait, I think it might be because I set "wide_window": true, that is probably the reason? I'll try this and close the issue for now.

lazear commented 10 months ago

Yes - setting wide_window will override the precursor_tol parameter and use the isolation window from the file (e.g. for DIA/PRM/WWA acquisition schemes). Perhaps I can think of a better name for this parameter.