Closed animesh closed 2 months ago
just to update, yield got a bit lower lfq - Copy (2).txt with latest compiled image?
docker run --rm -it -v /mnt/z/HeLa:/data animesh1977/sage /data/sage.json -f /data/human_crap.fasta -o /data /data/26june24_hel200_100spd_OT_1ulirt_S2-F2_1_6368.mzML /data/26june24_hel200_100spd_OT_1ulirt_S2-G2_1_6369.mzML
[sudo] password for ash022:
[2024-07-30T12:15:53Z INFO sage] generated 187444336 fragments, 5604466 peptides in 41559ms
[2024-07-30T12:15:53Z INFO sage] processing files 0 .. 2
[2024-07-30T12:20:20Z INFO sage] - file IO: 267555 ms
[2024-07-30T12:20:55Z INFO sage] - search: 34571 ms (1305 spectra/s)
[2024-07-30T12:20:55Z INFO sage_core::ml::retention_alignment] aligning file #0: y = 1.0000x + 0.0000
[2024-07-30T12:20:55Z INFO sage_core::ml::retention_alignment] aligning file #1: y = 1.0000x + 0.0000
[2024-07-30T12:20:55Z INFO sage_core::ml::retention_alignment] aligned retention times across 2 files
[2024-07-30T12:20:55Z INFO sage_core::ml::retention_model] - fit retention time model, rsq = NaN
[2024-07-30T12:20:55Z INFO sage_core::ml::mobility_model] - fit mobility model, rsq = NaN, mse = NaN
[2024-07-30T12:21:31Z INFO sage_core::lfq] tracing MS1 features
[2024-07-30T12:21:34Z INFO sage_core::lfq] integrating MS1 features
[2024-07-30T12:21:34Z INFO sage] discovered 274 target MS1 peaks at 5% FDR
[2024-07-30T12:21:34Z INFO sage] discovered 3960 target peptide-spectrum matches at 1% FDR
[2024-07-30T12:21:34Z INFO sage] discovered 371 target peptides at 1% FDR
[2024-07-30T12:21:34Z INFO sage] discovered 312 target proteins at 1% FDR
{
"version": "0.15.0-alpha",
"database": {
"bucket_size": 8192,
"enzyme": {
"missed_cleavages": 2,
"min_len": 7,
"max_len": 50,
"cleave_at": "KR",
"restrict": "P",
"c_terminal": null,
"semi_enzymatic": null
},
"peptide_min_mass": 500.0,
"peptide_max_mass": 5000.0,
"ion_kinds": [
"b",
"y"
],
"min_ion_index": 2,
"static_mods": {},
"variable_mods": {},
"max_variable_mods": 3,
"decoy_tag": "rev_",
"generate_decoys": true,
"fasta": "/data/human_crap.fasta"
},
"quant": {
"tmt": null,
"tmt_settings": {
"level": 3,
"sn": false
},
"lfq": true,
"lfq_settings": {
"peak_scoring": "Hybrid",
"integration": "Sum",
"spectral_angle": 0.6,
"ppm_tolerance": 5.0,
"combine_charge_states": true
}
},
"precursor_tol": {
"ppm": [
-20.0,
20.0
]
},
"fragment_tol": {
"ppm": [
-20.0,
20.0
]
},
"precursor_charge": [
2,
4
],
"override_precursor_charge": false,
"isotope_errors": [
0,
2
],
"deisotope": true,
"chimera": true,
"wide_window": true,
"min_peaks": 15,
"max_peaks": 150,
"max_fragment_charge": 1,
"min_matched_peaks": 4,
"report_psms": 5,
"predict_rt": true,
"mzml_paths": [
"/data/26june24_hel200_100spd_OT_1ulirt_S2-F2_1_6368.mzML",
"/data/26june24_hel200_100spd_OT_1ulirt_S2-G2_1_6369.mzML"
],
"output_paths": [
"/data/results.sage.tsv",
"/data/lfq.tsv",
"/data/results.json"
]
}
[2024-07-30T12:21:35Z INFO sage] finished in 383s
[2024-07-30T12:21:35Z INFO sage] cite: "Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale" https://doi.org/10.1021/acs.jproteome.3c00486
Looks like it's working to me.
I'm afraid I don't have the bandwidth to troubleshoot your data. My general suggestions are as follows:
Examine the results file to determine optimal tolerances to use. In many cases where data "doesn't work", the instruments are off calibration and the user has supplied tolerances that are too narrow.
I am trying to process couple of timsTOF-pro raw-files from MaxQuant analysis using timscovert-ed data followed by sage from ghcr.io/lazear/sage:v0.14.7
but results lfq - Copy.txt are nowhere close to expected? The parameter i am using sage.json is incorporating suggestions from https://sage-docs.vercel.app/docs/configuration/tolerance#wide-window-mode, is there something i am missing apart from PTMs which needs to be included for analysis, specifically DIA?