lehtiolab / proteogenomics-analysis-workflow

IPAW: a Nextflow workflow for proteogenomics
24 stars 8 forks source link

How can this pepeline be used to analyze non-canccer data #13

Open Jokendo-collab opened 4 years ago

Jokendo-collab commented 4 years ago

I find this pipeline very nice but how can it be used in the analysis of non cancer data?

yafeng commented 4 years ago

Hi @javanOkendo if you are not interested in detecting SNP variants or somatic mutations, you can remove the CanProVar and COSMIC entries in the varDB database before you apply the workflow. Then you can follow the same steps for non-cancer data.

Jokendo-collab commented 4 years ago

@yafeng I am getting the following error; executor > local (30) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [59/5329f4] process > percolator [100%] 2 of 2 ✔ [69/7b5133] process > getNovelPercolator [100%] 2 of 2 ✔ [e0/43a2dd] process > getVariantPercolator [100%] 2 of 2 ✔ [3c/290056] process > filterPercolator [100%] 4 of 4 ✔ [db/b21fa7] process > svmToTSV [100%] 4 of 4 ✔ [f8/a2085e] process > createPSMTables [100%] 4 of 4 ✔ [2a/17247d] process > prePeptideTable [ 0%] 0 of 4 [48/0aed18] process > prepSpectrumAI [ 0%] 0 of 2 [31/1fa8ca] process > mergeSetPSMtable [ 0%] 0 of 2 ERROR ~ Error executing process > 'prepSpectrumAI (1)'

Caused by: Process prepSpectrumAI (1) terminated with an error exit status (255)

Command executed:

label_sub_pos.py --input_psm sampleA_variant_psmtable.txt --output specai_in.txt

Command exit status: 255

Command output: (empty)

Command error: FATAL: container creation failed: mount /etc/localtime->/etc/localtime error: while mounting /etc/localtime: could not mount /etc/localtime: input/output error

Work dir: /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/workflow/work/dd/98189bad1266cd02493be6e46b1815 Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

executor > local (30) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [59/5329f4] process > percolator [100%] 2 of 2 ✔ [69/7b5133] process > getNovelPercolator [100%] 2 of 2 ✔ [e0/43a2dd] process > getVariantPercolator [100%] 2 of 2 ✔ [3c/290056] process > filterPercolator [100%] 4 of 4 ✔ [db/b21fa7] process > svmToTSV [100%] 4 of 4 ✔ [f8/a2085e] process > createPSMTables [100%] 4 of 4 ✔ [2a/17247d] process > prePeptideTable [ 0%] 0 of 4 [dd/98189b] process > prepSpectrumAI [ 50%] 1 of 2, failed: 1 [31/1fa8ca] process > mergeSetPSMtable [ 0%] 0 of 2 WARN: Killing pending tasks (7) ERROR ~ Error executing process > 'prepSpectrumAI (1)'

Caused by: Process prepSpectrumAI (1) terminated with an error exit status (255)

Command executed:

label_sub_pos.py --input_psm sampleA_variant_psmtable.txt --output specai_in.txt

Command exit status: 255 Command output: (empty)

Command error: FATAL: container creation failed: mount /etc/localtime->/etc/localtime error: while mounting /etc/localtime: could not mount /etc/localtime: input/output error

Work dir: /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/workflow/work/dd/98189bad1266cd02493be6e46b1815

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

executor > local (30) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [59/5329f4] process > percolator [100%] 2 of 2 ✔ [69/7b5133] process > getNovelPercolator [100%] 2 of 2 ✔ [e0/43a2dd] process > getVariantPercolator [100%] 2 of 2 ✔ [3c/290056] process > filterPercolator [100%] 4 of 4 ✔ [db/b21fa7] process > svmToTSV [100%] 4 of 4 ✔ [f8/a2085e] process > createPSMTables [100%] 4 of 4 ✔ [2c/8dba81] process > prePeptideTable [ 75%] 3 of 4, failed: 3 [48/0aed18] process > prepSpectrumAI [100%] 2 of 2, failed: 2 [31/1fa8ca] process > mergeSetPSMtable [ 0%] 0 of 2 WARN: Killing pending tasks (7) ERROR ~ Error executing process > 'prepSpectrumAI (1)'

Caused by: Process prepSpectrumAI (1) terminated with an error exit status (255) Command executed:

label_sub_pos.py --input_psm sampleA_variant_psmtable.txt --output specai_in.txt

Command exit status: 255

Command output: (empty)

Command error: FATAL: container creation failed: mount /etc/localtime->/etc/localtime error: while mounting /etc/localtime: could not mount /etc/localtime: input/output error

Work dir: /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/workflow/work/dd/98189bad1266cd02493be6e46b1815

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

executor > local (30) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [59/5329f4] process > percolator [100%] 2 of 2 ✔ [69/7b5133] process > getNovelPercolator [100%] 2 of 2 ✔ [e0/43a2dd] process > getVariantPercolator [100%] 2 of 2 ✔ [3c/290056] process > filterPercolator [100%] 4 of 4 ✔ [db/b21fa7] process > svmToTSV [100%] 4 of 4 ✔ [f8/a2085e] process > createPSMTables [100%] 4 of 4 ✔ [2a/17247d] process > prePeptideTable [100%] 4 of 4, failed: 4 [48/0aed18] process > prepSpectrumAI [100%] 2 of 2, failed: 2 [31/1fa8ca] process > mergeSetPSMtable [100%] 2 of 2, failed: 2 WARN: Killing pending tasks (7) ERROR ~ Error executing process > 'prepSpectrumAI (1)'

Jokendo-collab commented 4 years ago

N E X T F L O W ~ version 19.04.1 Launchingmain.nf[small_banach] - revision: 6ff5d47fa2 2 mzML files in analysis Detected setnames: sampleA, sampleB [warm up] executor > local WARN: Input tuple does not match input set cardinality declared by processsplitSetNormalSearchPsms` -- offending value: sampleA executor > local (1) [ea/c37f3b] process > makeTargetSeqLookup [ 0%] 0 of 1 [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔

executor > local (2) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [ 0%] 0 of 1

executor > local (2) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔

executor > local (4) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [58/a326b6] process > msgfPlus [ 0%] 0 of 2

executor > local (4) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [58/a326b6] process > msgfPlus [ 50%] 1 of 2 executor > local (6) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [d8/3b6275] process > percolator [ 0%] 0 of 2

executor > local (8) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [d8/3b6275] process > percolator [ 50%] 1 of 2 [0c/7c7b80] process > getNovelPercolator [ 0%] 0 of 1 [11/f42b61] process > getVariantPercolator [ 0%] 0 of 1

executor > local (8) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [d8/3b6275] process > percolator [ 50%] 1 of 2 [0c/7c7b80] process > getNovelPercolator [100%] 1 of 1 [11/f42b61] process > getVariantPercolator [100%] 1 of 1

executor > local (9) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [d8/3b6275] process > percolator [ 50%] 1 of 2 [0c/7c7b80] process > getNovelPercolator [100%] 1 of 1 [11/f42b61] process > getVariantPercolator [100%] 1 of 1 --More--(20%) Work dir: /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/workflow/work/dd/98189bad1266cd02493be6e46b1815

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

executor > local (30) [ea/c37f3b] process > makeTargetSeqLookup [100%] 1 of 1 ✔ [8e/4f629c] process > makeTrypSeq [100%] 1 of 1, cached: 1 ✔ [b6/342beb] process > createSpectraLookup [100%] 1 of 1, cached: 1 ✔ [0a/a91539] process > makeProtSeq [100%] 1 of 1, cached: 1 ✔ [a3/5e3717] process > concatFasta [100%] 1 of 1 ✔ [9e/513c28] process > msgfPlus [100%] 2 of 2 ✔ [59/5329f4] process > percolator [100%] 2 of 2 ✔ [69/7b5133] process > getNovelPercolator [100%] 2 of 2 ✔ [e0/43a2dd] process > getVariantPercolator [100%] 2 of 2 ✔ [3c/290056] process > filterPercolator [100%] 4 of 4 ✔ [db/b21fa7] process > svmToTSV [100%] 4 of 4 ✔ [f8/a2085e] process > createPSMTables [100%] 4 of 4 ✔ [2a/17247d] process > prePeptideTable [100%] 4 of 4, failed: 4 [48/0aed18] process > prepSpectrumAI [100%] 2 of 2, failed: 2 [31/1fa8ca] process > mergeSetPSMtable [100%] 2 of 2, failed: 2 WARN: Killing pending tasks (7) ERROR ~ Error executing process > 'prepSpectrumAI (1)'

Caused by: Process prepSpectrumAI (1) terminated with an error exit status (255)

Command executed:

label_sub_pos.py --input_psm sampleA_variant_psmtable.txt --output specai_in.txt

Command exit status: 255

Command output: (empty)

Command error: FATAL: container creation failed: mount /etc/localtime->/etc/localtime error: while mounting /etc/localtime: could not mount /etc/localtime: input/output error --More--(98%)

yafeng commented 4 years ago

I haven't seen this error before, can you paste the command you used? and what database was used?

Jokendo-collab commented 4 years ago

@yafeng I am analyzing the data from Mycobacterium tuberculosis samples. I did create the custome database using customeProDB software. I did use that custome database as my variantDB for the search. Below is the command which I used:

!/bin/sh

SBATCH --account=cbio

SBATCH --partition=ada

SBATCH --nodes=2 --ntasks=40

SBATCH --time=170:00:00

SBATCH --job-name="protgActualAnalysis"

SBATCH --mail-user=oknjav001@myuct.ac.za

SBATCH --mail-type=END,FAIL

module load software/nextflow-19.04 module load software/R-3.6.0 nextflow run main.nf -resume --tdb /scratch/oknjav001/transcriptomics/proteogenomics/ms_rnaseqdata/rnasq/variantcallresults/16bcg/T016BCGmerge-var.fasta --mzmldef raw files.txt --activation hcd --gtf /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/downloads/dbvar/VarDB.gtf --mods /scratch/oknjav001/bal_mzML_raw_fil es/databaseComparisonProject/msgfplus/searchEngine/MSGFPlus_Mods1.txt --knownproteins /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/downloads/homo/ Homo_sapiens.GRCh38.pep.all.fa --blastdb /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/downloads/dbvar/UniProteome+Ensembl87+refseq+GENCODE24.prote ins.fasta --snpfa /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/downloads/dbvar/MSCanProVar_ensemblV79.filtered.fasta --genome /scratch/oknjav001/g enome/hg19.fasta --annovar_dir /scratch/oknjav001/transcriptomics/proteogenomics/yafengpipeline/annovar --bigwigs /scratch/oknjav001/transcriptomics/proteogenomics/ya fengpipeline/bigwigs --bamfiles /scratch/oknjav001/transcriptomics/proteogenomics/ms_rnaseqdata/rnasq/LTB/trimmedres/bamfiles/tempanalysis/*.bam --outdir /scratch/ok njav001/transcriptomics/proteogenomics/yafengpipeline/actualAnalysis --profile slurm -c singularity.config

Jokendo-collab commented 4 years ago

I am not sure if this problem is because of the "Setname" in the text file containing the fullpath of the raw files. This maybe a spamy question but how should the text file containing the mzML full path look like?

ColumnA ColumnB path/to/sampleA.mzML sampleA path/to/sampleB.mzML sampleB I did prepare my txt file using the above example and I do not know if this is what is causing the above error and I get the following warning. WARN: Input tuple does not match input set cardinality declared by process splitSetNormalSearchPsms -- offending value: sampleA Could you help in this because this information is missing in the readme. You can add an examples of how that file should look like and this will help other researchers

yafeng commented 4 years ago

Your text file for the input MS data looks fine to me. The first column is full path of MS data, the second column is set name to group different MS raw file from same sample (label-free) or same TMT/iTRAQ set .

I suspect the error is due to the customized database you use. The pipeline is designed for VarDB, in which the sequences have specific fasta header, such as "PGOHUM", " lncRNA", "CanProVar", "COSMIC" to label different class of novel/variant peptides. These labels are used later in the pipeline to calculate class-specific FDR and further divided into different processes. if your database doesn't contain such headers, the pipeline will not be able to recognize them and will generate empty output, which probably causes errors in later steps.