lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
210 stars 39 forks source link

FDR: Peptide_q-values above 1% and MS2Rescore compatible #105

Closed Guitou1 closed 10 months ago

Guitou1 commented 10 months ago

Hi,

I'm currently working with Single-Cell data and I wanted to observe the FDR distribution of the PSMs obtained, but Sage's output mainly includes q-values of around 1%. Is there any way Sage could report PSMs at higher q-values ? This shouldn't have an impact on its discriminant score, does it ?

As a side question, does the longest y-ion series reported by Sage correspond to the consecutive y-ions annotated in a spectrum or simply the total number of annotated y-ions in the spectrum ? Is there a way Sage could report these annotations, and how are they calculated (couldn't find it in the code on GitHub...) ?

I'm afraid I lack skills in Rust to confidently apply changes.

On a sidenote, I have tried running MS2Rescore (a rescoring tool claiming to support sage.tsv outputs) without success in the Command Line Interface. Has anyone tried it successfully ? If so, could you share the method you've used ?

Thank you in advance !

lazear commented 10 months ago

The output file should contain all PSMs - there is no filtering applied for FDR or q-values. In the case of PSMs, q-values are directly calculated via counting decoys (1 + decoys)/targets after sorting by discriminant score. Peptide and protein q-values are calculated via summation of PEPs derived from a non-parametric estimate of discriminant score distribution (kernel density estimate) - this is done to enable the picked-FDR approaches. So filtering does not impact discriminant scores, since discriminant scores are calculated before FDR estimation.

For longest y-ion series, the reported value is the longest chain of consecutive y-ions annotated in a spectrum (not the number of y-ions). For instance if you had y1 y2 _ y4 y5 y6 y7, then longest_y = 4. Are you looking for an output to be added for total # of annotated y-ions?

I have run MS2Rescore successfully - let me see if I can figure out what I did and report back.

Guitou1 commented 10 months ago

Thank you for the fast response !

Once again, many thanks for the quick response !

lazear commented 10 months ago

I am currently reviewing a PR for building spectral libraries/writing all annotated peaks to a file, so hopefully that will help with the first point 😄.

I think you need to install the beta version of ms2rescore from source - I believe the version on pip doesn't support Sage results yet

Guitou1 commented 10 months ago

Great ! Good luck with that and thanks, I will try to do just that :)

Guitou1 commented 10 months ago

For anyone wondering: