lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Problem analysing tims-tof data #72

Closed tomthun closed 1 year ago

tomthun commented 1 year ago

Hello,

I currently wanted to test sage and compare it to other Software tools but cannot get it to produce results. Error.txt is the detailed output i get when running sage. As input I use tims-tof (.d)-converted .mzML files which i thought might be the issue. Hence tried different methods of generating these:

  1. I used the guide as described at the bottom of this page.
  2. I first converted the .d folders to .mgf with the alphatims package and then again used msconvert-GUI but with no changes to the default parameters (in contrast to the guide). This yielded a much smaller mzml file (500 mb vs 5gb for the "same" mzML files in 1.) so there are definitly some changes.

This is the fasta i use: NIST 8671_2021-11-22.txt Could it be that i just have to few proteins in the fasta? Otherwise have not changed any of the default parameters of the
results.json except the path variables of fasta and output paths.

I hope you can help me get more meaningful results and thank you in advance!!

Tom

lazear commented 1 year ago

Hi Tom,

Thanks for trying Sage - especially with tims data; that's uncharted territory! The error messages suggest that no confident PSMs were found (despite spectra being successfully read), hence failures to perform RT alignment, modeling, and discriminant analysis. This is likely due to too few proteins in the fasta file - sage utilizes target-decoy competition for confidence estimates, and those assumptions start to break down with small fasta databases. I would try with a full sized proteome fasta file and see if things work.

Alternatively, there should still be PSMs present in the output file, just without any confidence estimates.

tomthun commented 1 year ago

Hi,

I get PSMs in the output file, but just very few of them and with no confidence estimates. I am using methods of unspecific cleavage of the proteins, is there anyway to include that in the results.json so that we might get more peptide mappings? Does SAGE work at all with unspecific digestion of proteins?

Edit: I have found the answer under #37 .

lazear commented 1 year ago

So is this resolved?