nf-core / proteomicslfq

Proteomics label-free quantification (LFQ) analysis pipeline
https://nf-co.re/proteomicslfq
MIT License
33 stars 19 forks source link

Comet search with Lys-C digestion fails #124

Open ncarrut opened 3 years ago

ncarrut commented 3 years ago

My pipeline fails when using Lys-C as a digestion enzyme, NT=Lys-C;AC=MS:1001309 or NT=Lys-C/P;AC=MS_1001310. It works with trypsin NT=Trypsin;AC=MS_1001251. My command is:

nextflow run nf-core/proteomicslfq 
     -r 43c77e5 
     -profile singularity 
     --input data/pxd016772_lysC_sdrf.tsv 
     --add_decoys 
     --database data/UP000002311.fasta 
     -resume

An example failing .sdrf file is: pxd016772_lysC_II_sdrf.zip

log file: 20160607_QEP2_FBU_LP_189_tlc10_control_comet.log

jpfeuffer commented 3 years ago

Yes, unfortunately we did not yet implement support for non-default Comet enzymes: http://comet-ms.sourceforge.net/parameters/parameters_202001/search_enzyme_number.php

One would need to create an own comet.params file to do that and allow its specification in nextflow. Another possibility would be to extend the OpenMS wrapper for Comet to do that (at least for all OpenMS supported enzymes).

Quick solutions include giving up on the exact agreement with MSGF (which does not support "after" cutting rules) and therefore requires us to switch from LysC to LysC/P if both are activated.

Or from your side, using only one search engine should solve the problem for now, as well.

ncarrut commented 3 years ago

Thanks @jpfeuffer for your response. It turns out the problem isn't specifically with LysC. I get basically the same error using Arg-C, Asp-N, Unspecific cleavage or CNBr (all with comet). Chymotrypsin works in that the comet search succeeds but a problem comes up in the indexing step. For me it looks like only Trypsin behaves.

jpfeuffer commented 3 years ago

Yes, most of them are expected when used in combination with MSGF+ which does not support post-cutting rules: https://github.com/nf-core/proteomicslfq/blob/43c77e50c955d7e62899e7d31e0d6f6a87ac2316/main.nf#L525

I did not expect the error for unspecific cleavage. Can you send me the log there?

ncarrut commented 3 years ago

Using the command:

 nextflow run nf-core/proteomicslfq -r 43c77e5 
     -profile singularity 
     --input data/pxd016772_temp_sdrf.tsv 
     --add_decoys 
     --database data/UP000002311.fasta 
     -resume

with sdrf specifying unspecific cleavage: pxd016772_temp_sdrf.zip

I get exit code 6 with the log file: 20160607_QEP2_FBU_LP_188_tlc10_telo_comet.log

I've tried using "NT=Unspecific cleavage" and "NT=unspecific cleavage" with the same result.

jpfeuffer commented 3 years ago

Ah yes, thank you! Confirms my idea that there was also a case-sensitivity "bug". It might work already with "unspecific cleavage". But my linked PR should fix this as well.