Closed tiagosobreira closed 1 year ago
Hi Tiago,
Can you provide some more information?
How are you determining the results are different? How different are they?
Hi Michael,
Sure!
Version: sage-v0.10.0-x86_64-unknown-linux-gnu
It is one TMTpro sample collected on an Eclipse with FAIMS in three fractions (a total of three files)
Here is the config file: config.txt
I compared the spectrum_fdr from both results files.
Here is one example: The spectrum 36109 has FDR 0.36757833 on the first run and 0.0017423321 on the second run.
30% of the spectra have an FDR difference > 0.1 between both runs.
Is this difference expected?
Thank you again, Tiago
There is definitely something wonky going on here! There are occasionally very minor differences in FDR/discriminant scores just due to floating point rounding/numerical instability (e.g. maybe a handful of PSMs are re-ranked on very large searches) but this is totally out of wack.
I just confirmed again that running sage-v0.10.0-x86_64-unknown-linux-gnu on the PXD003881 dataset (20 files) had identical results across 900k PSMs. I also previously tested for reproducibility on a 250-file TMT16 dataset before releasing v0.10
Your parameters look good overall, but there is one thing sticking out:
"precursor_tol": {
"ppm": [-100,500]
}
Tolerances in Sage are specified in the reverse order from most other engines - they are applied to the observed/experimental mass, and not the theoretical one. E.g. for an open search you would specify (-500, 100) - an experimental mass of 2500 - 500 Da unknown mod.
Assuming you can't share the files (if you can, I will debug), can we try a couple things?
If this isn't it, let's try:
"predict_rt": false
(this might be screwing with the rescoring)Finally, please run sage like so:
$ SAGE_LOG=trace ./sage <parameters.json
This will output more information - please paste it below! This should help diagnose what's going on.
Hi Michael,
Thank you very much for your help.
This is kind of embarrassing, but I made a silly mistake pairing the spectra. It is no precise as our data, but it seems reasonable.
Reducing the number of modifications improves the correlation even more.
The "predict_rt" seems to be fine, and it is not interfering with the variation
The correction on how to use the "precursor_tol" made a huge difference in my analysis.
Thank you, Tiago
That's still more variation than I would expect, especially for a 3-file search.
Feel free to ask more questions if need be!
Hi Michael,
Thank you very much for this great tool!
I ran Sage twice using the same parameters on the same dataset and it gave me different results each time. Would you be able to explain why this occurs?
Thank you, Tiago