wfondrie / mokapot

Fast and flexible semi-supervised learning for peptide detection in Python
https://mokapot.readthedocs.io
Apache License 2.0
40 stars 14 forks source link

Decoy Prefix results in many unmapped peptides #102

Open jonathan-krieger-bruker opened 1 year ago

jonathan-krieger-bruker commented 1 year ago

Hi Will, Thanks for your time in advance:

I am trying to use Mokapot on sage results of timsTOF DDA data - nothing special. The issue I encounter is the following:

Mokapot on the pin file without any protein inference - I get expected results. Mokapot on the PIN file with a FASTA file containing no decoys - works fine Mokapot on the PIN file with a FASTA containing decoys - works fine

BUT, when specifying the decoy prepended tag,

$mokapot results.sage.pin -w 30 --proteins /mnt/d/FASTA/Human_Sprot_20220318_decoys.fasta --decoy_prefix Reverse_

I always encounter and error similar to the following:

Traceback (most recent call last):
  File "/home/jrkrieger/miniconda3/envs/Mokapot/bin/mokapot", line 8, in <module>
    sys.exit(main())
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/mokapot.py", line 136, in main
    psms, models = brew(
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/brew.py", line 183, in brew
    res = [
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/brew.py", line 184, in <listcomp>
    p.assign_confidence(s, eval_fdr=test_fdr, desc=d)
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/dataset.py", line 586, in assign_confidence
    return LinearConfidence(self, scores, eval_fdr=eval_fdr, desc=desc)
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/confidence.py", line 367, in __init__
    self._assign_confidence(desc=desc)
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/confidence.py", line 418, in _assign_confidence
    proteins = picked_protein(
  File "/home/jrkrieger/miniconda3/envs/Mokapot/lib/python3.8/site-packages/mokapot/picked_protein.py", line 99, in picked_protein
    raise ValueError(
ValueError: Fewer than 90% of all peptides could be matched to proteins. Please verify that your digest settings are correct.

changing the decoy prefix in the command to something nonsensical:

$mokapot results.sage.pin -w 30 --proteins /mnt/d/FASTA/Human_Sprot_20220318_decoys.fasta --decoy_prefix blahblah

gives the same results as if not including the --decoy_prefix flag (which makes sense).

Any suggestions as to what might be going on here would be much appreciated. Thanks, Jon

wfondrie commented 9 months ago

Hi @jonathan-krieger-bruker 👋

Sorry for the slow response - can you elaborate more on how the decoy sequences in your FASTA file were generated?

Also, a small example from the file would be helpful.