Closed radusuciu closed 1 year ago
I'm pretty sure I used ThermoRawFileParser for SPS-MS3 when I wrote that code... Lemme dig into a couple files and see if I see the same isolation window defect (that does appear to be a bug/invalid behavior from TRFP, imo).
I could certainly add in a check for this behavior and override it (to some reasonable value, say -1.2 to 1.2 m/z?). I can also resurrect the SPS-MS3 code. I commented it out a while back when I switched from using integer scan numbers to including the full spectrum title (e.g. "controllerType=0 scan=3169") - this is used to lookup the proper MS2 scan.
Note that the commented out code was just for calculating SPS precursor purity - Sage will currently quantify SPS-MS3 data, but will not report SPS purity. Shouldn't be any issue with these files, since the isolation window is only needed to calculate purity. I also use TRFP for SPS-MS3 data all the time.
Thank you for the reply and for confirming quantification of SPS-MS3 should be fine -- lack of SPS precursor purity isn't a big deal!
Lemme know what you find! Definitely want to make TMT workflows as good as possible.
FYI, I've had good success setting fragment_tol
much lower than what is frequently used in the literature (0.6 Da -> ~300 ppm)
Hey, so here's a question. Given the below config.json
, sage (version 0.13.1, hot off the press) finishes searching in 41 seconds and outputs reasonable looking files, but according to the output there are 0 PSMs/peptides/proteins at 1% FDR (see full output below).
Am I misconfiguring something?
{
"database": {
"bucket_size": 16384,
"fragment_min_mz": 150.0,
"fragment_max_mz": 1500.0,
"enzyme": {
"missed_cleavages": 1,
"cleave_at": "KR",
"restrict": "P"
},
"static_mods": {
"^": 304.207,
"K": 304.207,
"C": 57.0215
},
"variable_mods": {
"M": 15.9949
},
"decoy_tag": "rev_",
"generate_decoys": true,
"fasta": "UP000005640_9606.fasta"
},
"output_directory": "output_tmt18_literature",
"quant": {
"tmt": "Tmt18",
"tmt_settings": {
"level": 3,
"sn": false
},
"lfq": false
},
"deisotope": false,
"chimera": false,
"max_fragment_charge": 1,
"report_psms": 1,
"precursor_tol": {
"ppm": [
-50,
50
]
},
"fragment_tol": {
"ppm": [
-10,
10
]
},
"isotope_errors": [
-1,
3
],
"mzml_paths": [
"thermorawfileparser/1.4.2/ea05148_02042021_T18_F1.mzML"
]
}
Output:
[2023-05-19T00:59:47Z INFO sage] generated 60073946 fragments, 4277706 peptides in 8553ms
[2023-05-19T00:59:47Z INFO sage] processing files 0 .. 1
[2023-05-19T01:00:11Z INFO sage] - file IO: 15011 ms
[2023-05-19T01:00:11Z INFO sage] - search: 9035 ms (49964 spectra)
[2023-05-19T01:00:11Z INFO sage_core::ml::retention_alignment] aligning file #0: y = 1.0000x + 0.0000
[2023-05-19T01:00:11Z INFO sage_core::ml::retention_alignment] aligned retention times across 1 files
[2023-05-19T01:00:11Z INFO sage_core::ml::retention_model] - fit retention time model, rsq = NaN
[2023-05-19T01:00:19Z INFO sage] discovered 0 target peptide-spectrum matches at 1% FDR
[2023-05-19T01:00:19Z INFO sage] discovered 0 target peptides at 1% FDR
[2023-05-19T01:00:19Z INFO sage] discovered 0 target proteins at 1% FDR
File downloaded from this dataset: https://www.ebi.ac.uk/pride/archive/projects/PXD024275 Full URL: https://ftp.pride.ebi.ac.uk/pride/data/archive/2021/04/PXD024275/ea05148_02042021_T18_F1.raw Converted with ThermoRawFileParser 1.4.2
Thanks!
We used a 50-ppm precursor ion tolerance and 0.9 Da product ion tolerance for MS2 scans collected in the ion trap and 0.02 Da product ion tolerance for MS2 scans collected in the Orbitrap.
10ppm fragment tolerance is probably just a tad too narrow here 😉
Perhaps start with 0.9 Da and then move it down? I like to make a histogram of MS2 fragment tolerance vs decoys and see if it can be generally tighter (sns.displot(data=df, x='fragment_ppm', hue='label')
). I usually find that (on our instruments at least) even 0.6 Da is wider than necessary for searching MS3 datasets, and can produce sub-optimal results.
Using 10ppm, I also get 0 PSMs for that file.
With 0.9 Da:
[2023-05-19T01:43:24Z INFO sage] discovered 5980 target peptide-spectrum matches at 1% FDR
[2023-05-19T01:43:24Z INFO sage] discovered 5235 target peptides at 1% FDR
[2023-05-19T01:43:24Z INFO sage] discovered 3123 target proteins at 1% FDR
With 300 ppm:
[2023-05-19T01:45:30Z INFO sage] discovered 6920 target peptide-spectrum matches at 1% FDR
[2023-05-19T01:45:30Z INFO sage] discovered 6040 target peptides at 1% FDR
[2023-05-19T01:45:30Z INFO sage] discovered 3467 target proteins at 1% FDR
You might get difference results, given that we are using slightly different fasta files.
Makes sense, thank you for pointing that out!
Hi, I recently started working on putting together a TMT pipeline and am evaluating Sage as well as Comet/OpenMS/custom approach. My question has to do with processing SPS-MS3 data. I can see you've got this in the works but wanted to open a discussion after encountering an error when attempting to process a file using IsobaricAnalyzer.
Would sage also have issues with quantifying data from these mzMLs which lack isolation window offsets for SPS masses? Here's an example:
Thank you for your continued work on sage!