sr320 / LabDocs

Roberts Lab Documents
http://sr320.github.io/LabDocs/
9 stars 17 forks source link

Protein Prophet Clarifications #480

Closed sr320 closed 7 years ago

sr320 commented 7 years ago

A few questions re PP.

1) What is a decoy database? In the command below I get WARNING: No decoys with label DECOY_ were found in this dataset. reverting to fully unsupervised method. 2) Should I be getting this warning? 2b) What does -dDECOY_ do? 3) What does -OAp mean? 4) I get 80,000 lines of WARNING. Is this normal?

Thanks!

steven@emu:~/bioinfo/020917$ /usr/tpp_install/tpp/bin/xinteract \
> -dDECOY_ \
> -N20161205_Sample_1 \
> 20161205_Sample_1.pep.xml \
> -p0.9 \
> -OAp

/usr/tpp_install/tpp/bin/xinteract (TPP v5.0.0 Typhoon, Build 201612091438-exported (Linux-x86_64))
 naming output file interact-20161205_Sample_1.pep.xml

running: "/usr/tpp_install/tpp/bin/InteractParser 'interact-20161205_Sample_1.pep.xml' '20161205_Sample_1.pep.xml' -L'7'"
 file 1: 20161205_Sample_1.pep.xml
SUCCESS: CORRECTED data file /home/steven/bioinfo/020917/20161205_Sample_1.mzXML in msms_run_summary tag ...
SUCCESS: CORRECTED data file /home/steven/bioinfo/020917/20161205_Sample_1.mzXML in msms_run_summary tag ...
 processed altogether 89352 results
INFO: Results written to file: /home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml
command completed in 26 sec 

running: "/usr/tpp_install/tpp/bin/DatabaseParser 'interact-20161205_Sample_1.pep.xml'"
command completed in 0 sec 

running: "/usr/tpp_install/tpp/bin/RefreshParser 'interact-20161205_Sample_1.pep.xml' 'database_CgCont.fa'"
  - Building Commentz-Walter keyword tree...
  - Searching the tree...
  - Linking duplicate entries...
  - Printing results...

command completed in 12 sec 

running: "/usr/tpp_install/tpp/bin/PeptideProphetParser 'interact-20161205_Sample_1.pep.xml' DECOY=DECOY_ MINPROB=0.9 ACCMASS"
using Accurate Mass Bins
Using Decoy Label "DECOY_".
 (Comet)
adding ACCMASS mixture distribution
init with Comet trypsin 
MS Instrument info: Manufacturer: Thermo, Model: Orbitrap Fusion Lumos, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ... 
 PeptideProphet  (TPP v5.0.0 Typhoon, Build 201612091438-exported (Linux-x86_64)) AKeller@ISB
 read in 0 1+, 38769 2+, 36550 3+, 9547 4+, 1969 5+, 0 6+, and 0 7+ spectra.
Initialising statistical models ...
Found 0 Decoys, and 86835 Non-Decoys
WARNING: No decoys with label DECOY_ were found in this dataset. reverting to fully unsupervised method.
Iterations: .........10.........20.........30.
model complete after 32 iterations
command completed in 67 sec 

running: "/usr/tpp_install/tpp/bin/ProphetModels.pl -i interact-20161205_Sample_1.pep.xml -d "DECOY_""
Analyzing interact-20161205_Sample_1.pep.xml ...
Reading Accurate Mass Model model +1 ...
Reading Accurate Mass Model model +2 ...
Reading Accurate Mass Model model +3 ...
Reading Accurate Mass Model model +4 ...
Reading Accurate Mass Model model +5 ...
Reading Accurate Mass Model model +6 ...
Reading Accurate Mass Model model +7 ...
Parsing search results "/home/steven/bioinfo/020917/20161205_Sample_1 (Comet)"...
  => Found 38893 hits. (0 decoys, 0 excluded)
  => Total so far: 38893 hits. (0 decoys, 0 excluded)
command completed in 5 sec 

running: "/usr/tpp_install/tpp/cgi-bin/PepXMLViewer.cgi -I /home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml"
command completed in 4 sec 

running: "/usr/tpp_install/tpp/bin/ProteinProphet 'interact-20161205_Sample_1.pep.xml' 'interact-20161205_Sample_1.prot.xml'"
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.0.0 Typhoon, Build 201612091438-exported (Linux-x86_64))
 (no FPKM) (using degen pep info)
Reading in /home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml...
...read in 0 1+, 20139 2+, 15228 3+, 3203 4+, 323 5+, 0 6+, 0 7+ spectra with min prob 0.05

Initializing 29790 peptide weights: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Calculating protein lengths and molecular weights from database database_CgCont.fa
........WARNING: Trying to compute mass of non-residue: <
WARNING: Trying to compute mass of non-residue: <
WARNING: Trying to compute mass of non-residue: !
WARNING: Trying to compute mass of non-residue: !
WARNING: Trying to compute mass of non-residue:  
WARNING: Trying to compute mass of non-residue:  
WARNING: Trying to compute mass of non-residue: h
WARNING: Trying to compute mass of non-residue: h
WARNING: Trying to compute mass of non-residue: t
WARNING: Trying to compute mass of non-residue: t
WARNING: Trying to compute mass of non-residue: m
WARNING: Trying to compute mass of non-residue: m

There are about 80,000 lines with this WARNING

Computing degenerate peptides for 9765 proteins: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Computing probabilities for 11059 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11059 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11059 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11059 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11059 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11059 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing 7193 protein groups: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Calculating sensitivity...and error tables...
Computing MU for 11059 proteins: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
INFO: mu=5.98616e-06, db_size=40476286

Finished. Results written to: /home/steven/bioinfo/020917/interact-20161205_Sample_1.prot.xml
command completed in 78 sec 

running: "/usr/tpp_install/tpp/bin/ProtProphModels.pl -i interact-20161205_Sample_1.prot.xml"
Analyzing interact-20161205_Sample_1.prot.xml ...
command completed in 1 sec 

running: "/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml'"
File: /home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml
 - in ms run: /home/steven/bioinfo/020917/20161205_Sample_1...
-------------------------------------------------------------------------------
TPP DASHBOARD -- started at Fri Feb 10 07:02:16 2017
-------------------------------------------------------------------------------
File /home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml is pepxml
Found fval (+1) model...
Found ntt (+1) model...
Found nmc (+1) model...
Found AccurateMassModel ('+1') model...
Found IsoMassDiff (+1) model...
Found fval (+2) model...
Found ntt (+2) model...
Found nmc (+2) model...
Found AccurateMassModel ('+2') model...
Found IsoMassDiff (+2) model...
Found fval (+3) model...
Found ntt (+3) model...
Found nmc (+3) model...
Found AccurateMassModel ('+3') model...
Found IsoMassDiff (+3) model...
Found fval (+4) model...
Found ntt (+4) model...
Found nmc (+4) model...
Found AccurateMassModel ('+4') model...
Found IsoMassDiff (+4) model...
Found fval (+5) model...
Found ntt (+5) model...
Found nmc (+5) model...
Found AccurateMassModel ('+5') model...
Found IsoMassDiff (+5) model...
Found fval (+6) model...
Found ntt (+6) model...
Found nmc (+6) model...
Found AccurateMassModel ('+6') model...
Found IsoMassDiff (+6) model...
Found fval (+7) model...
Found ntt (+7) model...
Found nmc (+7) model...
Found AccurateMassModel ('+7') model...
Found IsoMassDiff (+7) model...
--> Trying to write file /home/steven/bioinfo/020917/interact-20161205_Sample_1.pep-MODELS.html
-------------------------------------------------------------------------------
Finished at Fri Feb 10 07:02:22 2017 with 0 errors.
-------------------------------------------------------------------------------

command completed in 6 sec 

running: "/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/020917/interact-20161205_Sample_1.prot.xml'"
File: /home/steven/bioinfo/020917/interact-20161205_Sample_1.prot.xml
-------------------------------------------------------------------------------
TPP DASHBOARD -- started at Fri Feb 10 07:02:22 2017
-------------------------------------------------------------------------------
File /home/steven/bioinfo/020917/interact-20161205_Sample_1.prot.xml is protxml
Found end of header
--> Trying to write file /home/steven/bioinfo/020917/interact-20161205_Sample_1.prot-MODELS.html
-------------------------------------------------------------------------------
Finished at Fri Feb 10 07:02:22 2017 with 0 errors.
-------------------------------------------------------------------------------

command completed in 0 sec 
/usr/tpp_install/tpp/bin/InteractParser 'interact-20161205_Sample_1.pep.xml' '20161205_Sample_1.pep.xml' -L'7' 26 sec
/usr/tpp_install/tpp/bin/DatabaseParser 'interact-20161205_Sample_1.pep.xml'
/usr/tpp_install/tpp/bin/RefreshParser 'interact-20161205_Sample_1.pep.xml' 'database_CgCont.fa' 12 sec
/usr/tpp_install/tpp/bin/PeptideProphetParser 'interact-20161205_Sample_1.pep.xml' DECOY=DECOY_ MINPROB=0.9 ACCMASS 67 sec
/usr/tpp_install/tpp/bin/ProphetModels.pl -i interact-20161205_Sample_1.pep.xml -d "DECOY_" 5 sec
/usr/tpp_install/tpp/cgi-bin/PepXMLViewer.cgi -I /home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml 4 sec
/usr/tpp_install/tpp/bin/ProteinProphet 'interact-20161205_Sample_1.pep.xml' 'interact-20161205_Sample_1.prot.xml' 78 sec
/usr/tpp_install/tpp/bin/ProtProphModels.pl -i interact-20161205_Sample_1.prot.xml 1 sec
/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/020917/interact-20161205_Sample_1.pep.xml' 6 sec
/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/020917/interact-20161205_Sample_1.prot.xml' 0 sec
job completed in 199 sec 
kubu4 commented 7 years ago

The manual for xinteract can be viewed by running xinteract w/o any commands:

/usr/tpp_install/tpp/bin/xinteract

PeptideProphet options [following the '-O']: -OAp - A [use accurate mass binning in PeptideProphet], p [run ProteinProphet afterwards]

sr320 commented 7 years ago

As Emma might not believe - I am not that incompetent :) I am looking for layperson explanation from Emma..

On Fri, Feb 10, 2017 at 7:40 AM kubu4 notifications@github.com wrote:

The manual for xinteract can be viewed by running xinteract w/o any commands:

/usr/tpp_install/tpp/bin/xinteract

PeptideProphet options [following the '-O']: -OAp - A [use accurate mass binning in PeptideProphet], p [run ProteinProphet afterwards]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sr320/LabDocs/issues/480#issuecomment-278977656, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPHt1coVkedrWNQrPdhVy_aZBHhC8ATks5rbITIgaJpZM4L9fRE .

kubu4 commented 7 years ago

Sorry, hadn't noticed she was assigned to this.

emmats commented 7 years ago

What is a decoy database? A decoy database is used to calculate the error rate of your search matches. In your Comet parameter file, you can select a decoy search (see below). This will search your peptide spectra against the reverse sequences of your database to see if peptides match to a "non-sequence".

decoy_search = 0 # 0=no (default), 1=concatenated search, 2=separate search

In the command below I get WARNING: No decoys with label DECOY_ were found in this dataset. reverting to fully unsupervised method. Should I be getting this warning? It sounds like the searches were run without a decoy database. It is usually better to use a decoy search. However, I have run searches without a decoy followed by TPP and I don't remember seeing that error. But that doesn't mean it didn't happen :)

2b) What does -dDECOY_ do? see above What does -OAp mean? It runs protein prophet after peptide prophet I get 80,000 lines of WARNING. Is this normal? I don't think so.

seanb80 commented 7 years ago

So I ran the same file, pulled from Steven's directory on Emu 20161205_Sample_1.raw and ran it through the ReAdW, Comet, and xInteract steps, with the only difference being the reference genome used (Stevens = database.CgCont.fa, mine = Uniprot gigas proteome.

Stevens reference results from xInteract:

stevens

Uniprot reference results from xInteract:

mine

The database.CgCont looks like:

database cgcont

While the Uniprot database looks like:

uniprot

There's a notebook for this here but it doesn't have any output, as the 80k+ errors makes it too large.

sr320 commented 7 years ago

Thanks - The 80k WARININGS are related to the fasta header.

see https://github.com/sr320/nb-2017/blob/master/C_gigas/00-Protein-database.ipynb

tldc: pic

once cleaned - new PP out:

steven@emu:~/bioinfo/021017$ /usr/tpp_install/tpp/bin/xinteract \
> -dDECOY_ \
> -N20161205_Sample_1 \
> 20161205_Sample_1.pep.xml \
> -p0.9 \
> -OAp

/usr/tpp_install/tpp/bin/xinteract (TPP v5.0.0 Typhoon, Build 201612091438-exported (Linux-x86_64))
 naming output file interact-20161205_Sample_1.pep.xml

running: "/usr/tpp_install/tpp/bin/InteractParser 'interact-20161205_Sample_1.pep.xml' '20161205_Sample_1.pep.xml' -L'7'"
 file 1: 20161205_Sample_1.pep.xml
SUCCESS: CORRECTED data file /home/steven/bioinfo/021017/20161205_Sample_1.mzXML in msms_run_summary tag ...
SUCCESS: CORRECTED data file /home/steven/bioinfo/021017/20161205_Sample_1.mzXML in msms_run_summary tag ...
 processed altogether 89352 results
INFO: Results written to file: /home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml
command completed in 25 sec 

running: "/usr/tpp_install/tpp/bin/DatabaseParser 'interact-20161205_Sample_1.pep.xml'"
command completed in 0 sec 

running: "/usr/tpp_install/tpp/bin/RefreshParser 'interact-20161205_Sample_1.pep.xml' 'Cg_Giga_cont_AA.fa'"
  - Building Commentz-Walter keyword tree...
  - Searching the tree...
  - Linking duplicate entries...
  - Printing results...

command completed in 11 sec 

running: "/usr/tpp_install/tpp/bin/PeptideProphetParser 'interact-20161205_Sample_1.pep.xml' DECOY=DECOY_ MINPROB=0.9 ACCMASS"
using Accurate Mass Bins
Using Decoy Label "DECOY_".
 (Comet)
adding ACCMASS mixture distribution
init with Comet trypsin 
MS Instrument info: Manufacturer: Thermo, Model: Orbitrap Fusion Lumos, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ... 
 PeptideProphet  (TPP v5.0.0 Typhoon, Build 201612091438-exported (Linux-x86_64)) AKeller@ISB
 read in 0 1+, 38769 2+, 36550 3+, 9547 4+, 1969 5+, 0 6+, and 0 7+ spectra.
Initialising statistical models ...
Found 0 Decoys, and 86835 Non-Decoys
WARNING: No decoys with label DECOY_ were found in this dataset. reverting to fully unsupervised method.
Iterations: .........10.........20.........30..
model complete after 33 iterations
command completed in 69 sec 

running: "/usr/tpp_install/tpp/bin/ProphetModels.pl -i interact-20161205_Sample_1.pep.xml -d "DECOY_""
Analyzing interact-20161205_Sample_1.pep.xml ...
Reading Accurate Mass Model model +1 ...
Reading Accurate Mass Model model +2 ...
Reading Accurate Mass Model model +3 ...
Reading Accurate Mass Model model +4 ...
Reading Accurate Mass Model model +5 ...
Reading Accurate Mass Model model +6 ...
Reading Accurate Mass Model model +7 ...
Parsing search results "/home/steven/bioinfo/021017/20161205_Sample_1 (Comet)"...
  => Found 38893 hits. (0 decoys, 0 excluded)
  => Total so far: 38893 hits. (0 decoys, 0 excluded)
command completed in 4 sec 

running: "/usr/tpp_install/tpp/cgi-bin/PepXMLViewer.cgi -I /home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml"
command completed in 4 sec 

running: "/usr/tpp_install/tpp/bin/ProteinProphet 'interact-20161205_Sample_1.pep.xml' 'interact-20161205_Sample_1.prot.xml'"
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v5.0.0 Typhoon, Build 201612091438-exported (Linux-x86_64))
 (no FPKM) (using degen pep info)
Reading in /home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml...
...read in 0 1+, 20144 2+, 15223 3+, 3203 4+, 323 5+, 0 6+, 0 7+ spectra with min prob 0.05

Initializing 29789 peptide weights: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Calculating protein lengths and molecular weights from database Cg_Giga_cont_AA.fa
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........1000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........2000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........3000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........4000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........5000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........6000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........7000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........8000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........9000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........10000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........11000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........12000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........13000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........14000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........15000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........16000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........17000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........18000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........19000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........20000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........21000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........22000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........23000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........24000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........25000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........26000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........27000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........28000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........29000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........30000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........31000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........32000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........33000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........34000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........35000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........36000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........37000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........38000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........39000
.........:.........:.........:.........:.........:.........:.........:.........:.........:.........40000
.........:.........:.........:.........:.........:.........:.........:.....  Total: 40751
Computing degenerate peptides for 9769 proteins: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Computing probabilities for 11063 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11063 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11063 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11063 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11063 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing probabilities for 11063 proteins.  Loop 1: 0%...20%...40%...60%...80%...100%  Loop 2: 0%...20%...40%...60%...80%...100%
Computing 7196 protein groups: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Calculating sensitivity...and error tables...
Computing MU for 11063 proteins: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
INFO: mu=5.99794e-06, db_size=40397625

Finished. Results written to: /home/steven/bioinfo/021017/interact-20161205_Sample_1.prot.xml
command completed in 79 sec 

running: "/usr/tpp_install/tpp/bin/ProtProphModels.pl -i interact-20161205_Sample_1.prot.xml"
Analyzing interact-20161205_Sample_1.prot.xml ...
command completed in 1 sec 

running: "/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml'"
File: /home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml
 - in ms run: /home/steven/bioinfo/021017/20161205_Sample_1...
-------------------------------------------------------------------------------
TPP DASHBOARD -- started at Fri Feb 10 11:41:59 2017
-------------------------------------------------------------------------------
File /home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml is pepxml
Found fval (+1) model...
Found ntt (+1) model...
Found nmc (+1) model...
Found AccurateMassModel ('+1') model...
Found IsoMassDiff (+1) model...
Found fval (+2) model...
Found ntt (+2) model...
Found nmc (+2) model...
Found AccurateMassModel ('+2') model...
Found IsoMassDiff (+2) model...
Found fval (+3) model...
Found ntt (+3) model...
Found nmc (+3) model...
Found AccurateMassModel ('+3') model...
Found IsoMassDiff (+3) model...
Found fval (+4) model...
Found ntt (+4) model...
Found nmc (+4) model...
Found AccurateMassModel ('+4') model...
Found IsoMassDiff (+4) model...
Found fval (+5) model...
Found ntt (+5) model...
Found nmc (+5) model...
Found AccurateMassModel ('+5') model...
Found IsoMassDiff (+5) model...
Found fval (+6) model...
Found ntt (+6) model...
Found nmc (+6) model...
Found AccurateMassModel ('+6') model...
Found IsoMassDiff (+6) model...
Found fval (+7) model...
Found ntt (+7) model...
Found nmc (+7) model...
Found AccurateMassModel ('+7') model...
Found IsoMassDiff (+7) model...
--> Trying to write file /home/steven/bioinfo/021017/interact-20161205_Sample_1.pep-MODELS.html
-------------------------------------------------------------------------------
Finished at Fri Feb 10 11:42:04 2017 with 0 errors.
-------------------------------------------------------------------------------

command completed in 5 sec 

running: "/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/021017/interact-20161205_Sample_1.prot.xml'"
File: /home/steven/bioinfo/021017/interact-20161205_Sample_1.prot.xml
-------------------------------------------------------------------------------
TPP DASHBOARD -- started at Fri Feb 10 11:42:04 2017
-------------------------------------------------------------------------------
File /home/steven/bioinfo/021017/interact-20161205_Sample_1.prot.xml is protxml
Found end of header
--> Trying to write file /home/steven/bioinfo/021017/interact-20161205_Sample_1.prot-MODELS.html
-------------------------------------------------------------------------------
Finished at Fri Feb 10 11:42:04 2017 with 0 errors.
-------------------------------------------------------------------------------

command completed in 0 sec 
/usr/tpp_install/tpp/bin/InteractParser 'interact-20161205_Sample_1.pep.xml' '20161205_Sample_1.pep.xml' -L'7' 25 sec
/usr/tpp_install/tpp/bin/DatabaseParser 'interact-20161205_Sample_1.pep.xml'
/usr/tpp_install/tpp/bin/RefreshParser 'interact-20161205_Sample_1.pep.xml' 'Cg_Giga_cont_AA.fa' 11 sec
/usr/tpp_install/tpp/bin/PeptideProphetParser 'interact-20161205_Sample_1.pep.xml' DECOY=DECOY_ MINPROB=0.9 ACCMASS 69 sec
/usr/tpp_install/tpp/bin/ProphetModels.pl -i interact-20161205_Sample_1.pep.xml -d "DECOY_" 4 sec
/usr/tpp_install/tpp/cgi-bin/PepXMLViewer.cgi -I /home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml 4 sec
/usr/tpp_install/tpp/bin/ProteinProphet 'interact-20161205_Sample_1.pep.xml' 'interact-20161205_Sample_1.prot.xml' 79 sec
/usr/tpp_install/tpp/bin/ProtProphModels.pl -i interact-20161205_Sample_1.prot.xml 1 sec
/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/021017/interact-20161205_Sample_1.pep.xml' 5 sec
/usr/tpp_install/tpp/bin/tpp_models.pl '/home/steven/bioinfo/021017/interact-20161205_Sample_1.prot.xml' 0 sec
job completed in 198 sec 

But should I decoy? or not?

emmats commented 7 years ago

you should decoy

sr320 commented 7 years ago

do you recommend 1=concatenated search, 2=separate search ?

emmats commented 7 years ago

1=concatenated search

sr320 commented 7 years ago

Done... Changed parameters file accordingly.

but still get

Found 0 Decoys, and 86835 Non-Decoys
WARNING: No decoys with label DECOY_ were found in this dataset. reverting to fully unsupervised method.

as a reminder this is the code I am using

steven@emu:~/bioinfo/020917$ /usr/tpp_install/tpp/bin/xinteract \
> -dDECOY_ \
> -N20161205_Sample_1 \
> 20161205_Sample_1.pep.xml \
> -p0.9 \
> -OAp
sr320 commented 7 years ago

Hold on ... let me try something else....

emmats commented 7 years ago

Here is what I run: xinteract -p0.9 -OAp -dDECOY_ -N2016_Dec_16_Kaho_40_54_QE_26 2016_Dec_16_Kaho_40_54_QE_26.pep.xml

It looks a lot like yours.

sr320 commented 7 years ago

Still getting the error- how about your comet parameter file? Can I see that (or all the files in the directory and I can run locally)?

emmats commented 7 years ago

It is all on the GS server and you would have to ssh in with an account set up through them. Although Yaamini and Laura know how to access the files on my account (don't tell!). Here is an example of a parameter file:

# comet_version 2016.01 rev. 2
# Comet MS/MS search engine parameters file.
# Everything following the '#' symbol is treated as a comment.

database_name = /net/gs/vol4/shared/nunnlab/search/emmats/transdecoder/copepod/pleuromamma_all.nr.fasta.f50.nuc.transdecoder.pep
decoy_search = 1                       # 0=no (default), 1=concatenated search, 2=separate search

num_threads = 0                        # 0=poll CPU to set num threads; else specify num threads directly (max 64)

#
# masses
#
peptide_mass_tolerance = 3.00
peptide_mass_units = 0                 # 0=amu, 1=mmu, 2=ppm
mass_type_parent = 1                   # 0=average masses, 1=monoisotopic masses
mass_type_fragment = 1                 # 0=average masses, 1=monoisotopic masses
precursor_tolerance_type = 0           # 0=MH+ (default), 1=precursor m/z; only valid for amu/mmu tolerances
isotope_error = 0                      # 0=off, 1=on -1/0/1/2/3 (standard C13 error), 2= -8/-4/0/4/8 (for +4/+8 labeling)

#
# search enzyme
#
search_enzyme_number = 1               # choose from list at end of this params file
num_enzyme_termini = 2                 # 1 (semi-digested), 2 (fully digested, default), 8 C-term unspecific , 9 N-term unspecific
allowed_missed_cleavage = 2            # maximum value is 5; for enzyme search

#
# Up to 9 variable modifications are supported
# format:  <mass> <residues> <0=variable/else binary> <max_mods_per_peptide> <term_distance> <n/c-term> <required>
#     e.g. 79.966331 STY 0 3 -1 0 0
#
variable_mod01 = 15.9949 M 0 3 -1 0 0
variable_mod02 = 0.0 X 0 3 -1 0 0
variable_mod03 = 0.0 X 0 3 -1 0 0
variable_mod04 = 0.0 X 0 3 -1 0 0
variable_mod05 = 0.0 X 0 3 -1 0 0
variable_mod06 = 0.0 X 0 3 -1 0 0
variable_mod07 = 0.0 X 0 3 -1 0 0
variable_mod08 = 0.0 X 0 3 -1 0 0
variable_mod09 = 0.0 X 0 3 -1 0 0
max_variable_mods_in_peptide = 5
require_variable_mod = 0

#
# fragment ions
#
# ion trap ms/ms:  1.0005 tolerance, 0.4 offset (mono masses), theoretical_fragment_ions = 1
# high res ms/ms:    0.02 tolerance, 0.0 offset (mono masses), theoretical_fragment_ions = 0
#
fragment_bin_tol = 1.0005              # binning to use on fragment ions
fragment_bin_offset = 0.4              # offset position to start the binning (0.0 to 1.0)
theoretical_fragment_ions = 1          # 0=use flanking peaks, 1=M peak only
use_A_ions = 0
use_B_ions = 1
use_C_ions = 0
use_X_ions = 0
use_Y_ions = 1
use_Z_ions = 0
use_NL_ions = 0                        # 0=no, 1=yes to consider NH3/H2O neutral loss peaks

#
# output
#
output_sqtstream = 0                   # 0=no, 1=yes  write sqt to standard output
output_sqtfile = 0                     # 0=no, 1=yes  write sqt file
output_txtfile = 0                     # 0=no, 1=yes  write tab-delimited txt file
output_pepxmlfile = 1                  # 0=no, 1=yes  write pep.xml file
output_percolatorfile = 0              # 0=no, 1=yes  write Percolator tab-delimited input file
output_outfiles = 0                    # 0=no, 1=yes  write .out files
print_expect_score = 1                 # 0=no, 1=yes to replace Sp with expect in out & sqt
num_output_lines = 5                   # num peptide results to show
show_fragment_ions = 0                 # 0=no, 1=yes for out files only

sample_enzyme_number = 1               # Sample enzyme which is possibly different than the one applied to the search.
                                       # Used to calculate NTT & NMC in pepXML output (default=1 for trypsin).

#
# mzXML parameters
#
scan_range = 0 0                       # start and scan scan range to search; 0 as 1st entry ignores parameter
precursor_charge = 0 0                 # precursor charge range to analyze; does not override any existing charge; 0 as 1st entry ignores parameter
override_charge = 0                    # 0=no, 1=override precursor charge states, 2=ignore precursor charges outside precursor_charge range, 3=see online
ms_level = 2                           # MS level to analyze, valid are levels 2 (default) or 3
activation_method = ALL                # activation method; used if activation method set; allowed ALL, CID, ECD, ETD, PQD, HCD, IRMPD

#
# misc parameters
#
digest_mass_range = 600.0 5000.0       # MH+ peptide mass range to analyze
num_results = 100                      # number of search hits to store internally
skip_researching = 1                   # for '.out' file output only, 0=search everything again (default), 1=don't search if .out exists
max_fragment_charge = 3                # set maximum fragment charge state to analyze (allowed max 5)
max_precursor_charge = 6               # set maximum precursor charge state to analyze (allowed max 9)
nucleotide_reading_frame = 0           # 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six
clip_nterm_methionine = 0              # 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine
spectrum_batch_size = 0                # max. # of spectra to search at a time; 0 to search the entire scan range in one loop
decoy_prefix = DECOY_                  # decoy entries are denoted by this string which is pre-pended to each protein accession
output_suffix =                        # add a suffix to output base names i.e. suffix "-C" generates base-C.pep.xml from base.mzXML input
mass_offsets =                         # one or more mass offsets to search (values substracted from deconvoluted precursor mass)

#
# spectral processing
#
minimum_peaks = 10                     # required minimum number of peaks in spectrum to search (default 10)
minimum_intensity = 0                  # minimum intensity value to read in
remove_precursor_peak = 0              # 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD)
remove_precursor_tolerance = 1.5       # +- Da tolerance for precursor removal
clear_mz_range = 0.0 0.0               # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range

#
# additional modifications
#

add_Cterm_peptide = 0.0
add_Nterm_peptide = 0.0
add_Cterm_protein = 0.0
add_Nterm_protein = 0.0

add_G_glycine = 0.0000                 # added to G - avg.  57.0513, mono.  57.02146
add_A_alanine = 0.0000                 # added to A - avg.  71.0779, mono.  71.03711
add_S_serine = 0.0000                  # added to S - avg.  87.0773, mono.  87.03203
add_P_proline = 0.0000                 # added to P - avg.  97.1152, mono.  97.05276
add_V_valine = 0.0000                  # added to V - avg.  99.1311, mono.  99.06841
add_T_threonine = 0.0000               # added to T - avg. 101.1038, mono. 101.04768
add_C_cysteine = 57.021464             # added to C - avg. 103.1429, mono. 103.00918
add_L_leucine = 0.0000                 # added to L - avg. 113.1576, mono. 113.08406
add_I_isoleucine = 0.0000              # added to I - avg. 113.1576, mono. 113.08406
add_N_asparagine = 0.0000              # added to N - avg. 114.1026, mono. 114.04293
add_D_aspartic_acid = 0.0000           # added to D - avg. 115.0874, mono. 115.02694
add_Q_glutamine = 0.0000               # added to Q - avg. 128.1292, mono. 128.05858
add_K_lysine = 0.0000                  # added to K - avg. 128.1723, mono. 128.09496
add_E_glutamic_acid = 0.0000           # added to E - avg. 129.1140, mono. 129.04259
add_M_methionine = 0.0000              # added to M - avg. 131.1961, mono. 131.04048
add_O_ornithine = 0.0000               # added to O - avg. 132.1610, mono  132.08988
add_H_histidine = 0.0000               # added to H - avg. 137.1393, mono. 137.05891
add_F_phenylalanine = 0.0000           # added to F - avg. 147.1739, mono. 147.06841
add_U_selenocysteine = 0.0000          # added to U - avg. 150.3079, mono. 150.95363
add_R_arginine = 0.0000                # added to R - avg. 156.1857, mono. 156.10111
add_Y_tyrosine = 0.0000                # added to Y - avg. 163.0633, mono. 163.06333
add_W_tryptophan = 0.0000              # added to W - avg. 186.0793, mono. 186.07931
add_B_user_amino_acid = 0.0000         # added to B - avg.   0.0000, mono.   0.00000
add_J_user_amino_acid = 0.0000         # added to J - avg.   0.0000, mono.   0.00000
add_X_user_amino_acid = 0.0000         # added to X - avg.   0.0000, mono.   0.00000
add_Z_user_amino_acid = 0.0000         # added to Z - avg.   0.0000, mono.   0.00000

#
# COMET_ENZYME_INFO _must_ be at the end of this parameters file
#
[COMET_ENZYME_INFO]
0.  No_enzyme              0      -           -
1.  Trypsin                1      KR          P
2.  Trypsin/P              1      KR          -
3.  Lys_C                  1      K           P
4.  Lys_N                  0      K           -
5.  Arg_C                  1      R           P
6.  Asp_N                  0      D           -
7.  CNBr                   1      M           -
8.  Glu_C                  1      DE          P
9.  PepsinA                1      FL          P
10. Chymotrypsin           1      FWYL        P
sr320 commented 7 years ago

As it stands I get

INFO: Processing standard MixtureModel ... 
 PeptideProphet  (TPP v5.0.0 Typhoon, Build 201612091438-exported (Linux-x86_64)) AKeller@ISB
 read in 0 1+, 38769 2+, 36550 3+, 9547 4+, 1969 5+, 0 6+, and 0 7+ spectra.
Initialising statistical models ...
Found 0 Decoys, and 86835 Non-Decoys
WARNING: No decoys with label DECOY_ were found in this dataset. reverting to fully unsupervised method.

which kind of makes since as I am telling it decoy proteins are tagged with DECOY_. and I have not done this....

-d<tag> [use decoy hits to pin down the negative distribution.
                          the decoy protein names must begin with <tag> (whitespace is not allowed)]
emmats commented 7 years ago

Comet automatically tags your decoy proteins with 'DECOY'

sr320 commented 7 years ago

I would argue it is not :)

If it was why is xinteract finding 0 decoys ?

emmats commented 7 years ago

Because your data are perfect.

sr320 commented 7 years ago

WAIT!! maybe I need to rerun Comet?

sr320 commented 7 years ago

You knew this the whole time, just using this as a teaching moment?

emmats commented 7 years ago

Well we had already gone over that decoys are only detected if you run a decoy search. I thought you were paying attention since you claimed competence.

emmats commented 7 years ago

I've also been sitting here, running TPP correctly, for the last 20 minutes of this "conversation".

sr320 commented 7 years ago

Found 20718 Decoys, and 66210 Non-Decoys

winning?

giphy

emmats commented 7 years ago

definitely. And I think you can still make your ferry.