lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

TMT11 quantification on MS2 level? #82

Closed ningzhibin closed 5 months ago

ningzhibin commented 11 months ago

Hi, thank you for such a magic search, it is fast without any sacrifice on identification. And it does quantification as well. my test result on label-free is amazing. My recent test on TMT11 on MS2 level (HCD) from thermo Exploris 480 data seems not working properly. The identification seems good, just with all channels 0 for quantification. Wondering have you ever tested data like this?

lazear commented 11 months ago

Can you share the contents of "results.json" (the output file containing configuration options)? MS2-TMT should work (assuming quant.tmt.tmt_settings.level = 2).

I haven't exhaustively tested MS2 TMT though (lots and lots of testing for MS3), and haven't validated the interaction between MS2-quant and MS2-deisotoping (I will add to my to-do list).

If you validate that the correct MS level value is set and it's still not working, it would be useful if you could share an mzML with me for testing (and any other relevant files: fasta database, existing output files)

ningzhibin commented 11 months ago

Inspired by your suggestion, I think I figured out the problem. My bad, I forgot to change the "fragment_min_mz". It was 150 before (copied from your blog at the end ). It works now when it was changed to 100 (with quant.tmt.tmt_settings.level = 2). I guess setting it to 150 does not matter MS3 TMT because the id and quant are separate.

I have another suggestion though, it would be good that the tmt.tsv is peptide-based (instead of scan-based), like the LFQ.tsv, which is a data matrix of quantification.

Thanks again!

lazear commented 11 months ago

I have another suggestion though, it would be good that the tmt.tsv is peptide-based (instead of scan-based), like the LFQ.tsv, which is a data matrix of quantification.

Is adding a peptide/protein column sufficient? Or are you asking for aggregating the data in some manner?

ningzhibin commented 11 months ago

My understanding is that LFQ.tsv is the result of aggregation ( sum/meidan from spectra/scan to peptide)? If it is not too much work, it would be good to have a TMT result table of peptides in the same format (in a similar way).

Specifically, the table would have these columns: peptide proteins q_value score spectral_angle File1_tmt_1 File1_tmt_2 ............. File1_tmt_11
File2_tmt_1 File2_tmt_2 ............. File2_tmt_11 ..........

Cheers

lazear commented 5 months ago

Hi, sorry I never got back to you earlier!

I can confirm that MS2 searching and quant works well - but you need to disable MS2 deisotoping (deisotope: false) in the settings. If there's a need, we could probably modify the deisotoping code to skip the TMT reporter ion region.

In terms of changing the file format, I would prefer to keep it in "long format" - it tends to make it easier for programmatic use, and most of the Sage outputs assume that you are using some kind of python or R script to handle merging the tables anyway. The LFQ table appears as such because it has a different in-memory representation, and comparing across files is directly necessary.