Closed bwu62 closed 3 years ago
This is something I have done without MOSCA, and using grinder and polyester data as well. The workflow for this analysis would involve the preprocessing and then gene calling on the MT data directly. From there, the remaining annotation steps could happen as normal, and differential expression analysis would consider as read count the number of times each gene was called. It worked very well with simulated data from grinder, and similar results were obtained as those using MG data. I didn't thought there would be interest in such a workflow from a practical point of view, but it can easily be integrated. I will work to implement this after fixing the problems that the tool currently has. One question: did you manage to simulate FastQ reads with polyester? I only managed to simulate FASTA files with it.
I used polyester to simulate fasta files, then I used art_illumina to simulate the fastq reads. I have been trying to use these simulated fastq reads to run, but the annotation step runs into the error I described.
Do you have code from when you ran it without MOSCA just manually? I would really appreciate if you could send it. Thank you
Best, Bi Cheng
From: João Sequeira notifications@github.com Sent: Monday, April 22, 2019 5:58:15 PM To: iquasere/MOSCA Cc: Bi Cheng Wu; Author Subject: Re: [iquasere/MOSCA] Possible to run MT without MG? (#3)
This is something I have done without MOSCA, and using grinder https://github.com/zyxue/biogrinder and polyester data as well. The workflow for this analysis would involve the preprocessing and then gene calling on the MT data directly. From there, the remaining annotation steps could happen as normal, and differential expression analysis would consider as read count the number of times each gene was called. It worked very well with simulated data from grinder, and similar results were obtained as those using MG data. I didn't throught there would be interest in such a workflow from a practical point of view, but it can easily be integrated. I will work to implement this after fixing the problems that the tool currently has. One question: did you manage to simulate FastQ reads with polyester? I only managed to simulate FASTA files with it.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/iquasere/MOSCA/issues/3#issuecomment-485581347, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHZ5JZEQPUSI53FWWSUMFODPRY7IPANCNFSM4HHKRP6Q.
Preprocess your data
If you have two or more files of data, merge them in one file
cat file1 file2 > mt_reads.fastq
Convert the fastq file to fasta (FragGeneScan only accepts fasta input)
paste - - - - < mt_reads.fastq | cut -f 1,2 | sed 's/^@/>/' | tr \"\t" "\n\" > mt_reads.fasta
Perform gene calling on the MT fasta reads.
run_FragGeneScan.pl -genome=mt_reads.fasta -out=fgs -complete=0 -train=./error_model
Error model is specific to the sequencing you performed, in MOSCA the default is "illumina_10", but FragGeneScan help message explains each file.
[train_file_name]: file name that contains model parameters; this file should be in the "train" directory
Note that four files containing model parameters already exist in the "train" directory
[complete] for complete genomic sequences or short sequence reads without sequencing error
[sanger_5] for Sanger sequencing reads with about 0.5% error rate
[sanger_10] for Sanger sequencing reads with about 1% error rate
[454_10] for 454 pyrosequencing reads with about 1% error rate
[454_30] for 454 pyrosequencing reads with about 3% error rate
[illumina_5] for Illumina sequencing reads with about 0.5% error rate
[illumina_10] for Illumina sequencing reads with about 1% error rate
Annotate the ORFs obtained with DIAMOND
diamond blastp --db database.dmnd --out aligned.blast --query fgs.faa --max-target-seqs 1
You end up with an annotation file where the first column is the name of the ORF and the second is the identification of that protein. If you sum the occurrences, you will have an approximation of the relative presence of proteins in that sample. After counting the occurrences of every protein in each sample,, you end up with an expression matrix that can undergo differential expression analysis!
Thank you! I will try this and see what I get.
Best,
Bi
From: João Sequeira notifications@github.com Sent: Tuesday, April 23, 2019 8:43:59 AM To: iquasere/MOSCA Cc: Bi Cheng Wu; Author Subject: Re: [iquasere/MOSCA] Possible to run MT without MG? (#3)
Preprocess your data
If you have two or more files of data, merge them in one file ´´´ cat file1 file2 > mt_reads.fastq ´´´
Convert the fastq file to fasta (FragGeneScan only accepts fasta input) ´´´ paste - - - - < mt_reads.fastq | cut -f 1,2 | sed 's/^@/>/' | tr "\t" "\n" > mt_reads.fasta ´´´
Perform gene calling on the MT fasta reads. Error model is specific to the sequencing you performed, in MOSCA the default is "illumina_10", but FragGeneScan help message explains each file. ´´´ run_FragGeneScan.pl -genome=mt_reads.fasta -out=fgs -complete=0 -train=./error_model ´´´
Annotate the ORFs obtained with DIAMOND ´´´ diamond blastp --db database.dmnd --out aligned.blast --query fgs.faa --max-target-seqs 1 ´´´
You end up with an annotation file where the first column is the name of the ORF and the second is the identification of that protein. If you sum the occurrences, you will have an approximation of the relative presence of proteins in that sample. After counting the occurrences of every protein in each sample,, you end up with an expression matrix that can undergo differential expression analysis!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/iquasere/MOSCA/issues/3#issuecomment-485808606, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHZ5JZEXOLNJJPDFUVHBCPTPR4HB7ANCNFSM4HHKRP6Q.
Hi,
I follow your tutorial and use the commandline below to finish the install.
conda create -n mosca -c conda-forge -c bioconda -c anaconda mosca=1.2.1
however, I encounter such error report when it with my data. Could you help me to resolve the problem?
Thanks.
(mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json
KeyError in line 15 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile:
'Name'
File "/home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile", line 15, in
best, zhiyong
(mosca2) zyshen@gpz:~/work/MOSCA$ cat experiments.tsv ,Files,Sample,Data type,Condition,Name 0,"/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610695-1a_1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610695-1a_2.fastq",Sample,mrna,c1,cancer1 1,"/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610696-1a_1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610696-1a_2.fastq",Sample,mrna,c1,cancer2
Hi,
if i use experiment.xlsx file and encounter another problem.
(mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json
AttributeError in line 16 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile:
'float' object has no attribute 'split'
File "/home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile", line 16, in
Greetings! I see your first experiments file is in CSV format, MOSCA only accepts in TSV or EXCEL format. If you obtain your experiments file from MOSGUITO it should come in the right format!
Hi iquasere, Thanks for your quikly reply, I follow your guide and encounter another problem again.
(mosca2) zyshen@gpz:~/work/MOSCA$ head experiments.tsv Files Sample Data type Condition Name /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mg_R2.fastq sample dna MG mgname /home/zyshen/work/MOSCA/20201023_L_QMK/mt1_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R2.fastq sample mrna MT mtname
(mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json Building DAG of jobs... InputFunctionException in line 56 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Error: IndexError: single positional indexer is out-of-bounds Wildcards: name=mg Traceback: File "/home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile", line 36, in preprocess_input File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/pandas/core/indexing.py", line 894, in getitem File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1500, in _getitem_axis File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1443, in _validate_integer
any suggestion? Thanks. zhiyong
Hi iquasere, Is there any requirement for the fastq file's format? I find that one of my paired data can run it for some time and stop when encounter another error.
(mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 1 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 13 Select jobs to execute...
[Thu Jan 14 16:11:46 2021] rule join_reads: output: output/Assembly/sample_forward.fastq, output/Assembly/sample_reverse.fastq jobid: 2 wildcards: sample=sample
[Thu Jan 14 16:11:46 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtname_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtname_reverse_paired.fq jobid: 9 wildcards: name=mtname threads: 15
Job counts: count jobs 1 join_reads 1 Job counts: count jobs 1 preprocess 1 fastqc --outdir output/Preprocess/FastQC --threads 15 --extract /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R1.fastq /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R2.fastq Started analysis of mg_R1.fastq Started analysis of mg_R2.fastq MissingOutputException in line 70 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Assembly/sample_forward.fastq output/Assembly/sample_reverse.fastq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message ................................................................... Approx 95% complete for quality_trimmed_mt1_reverse_paired.fq Analysis complete for quality_trimmed_mt1_forward_paired.fq Analysis complete for quality_trimmed_mt1_reverse_paired.fq MissingOutputException in line 57 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Preprocess/Trimmomatic/quality_trimmed_mtname_forward_paired.fq output/Preprocess/Trimmomatic/quality_trimmed_mtname_reverse_paired.fq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zyshen/work/MOSCA/.snakemake/log/2021-01-14T144311.510674.snakemake.log
thanks zhiyong
That is very odd, it stopped right after the first FastQC check. Can you send me your experiments TSV file?
hi iquasere, This file it works, but the pipeline stop after trimmed work. Approx 85% complete for quality_trimmed_mt3_reverse_paired.fq Approx 85% complete for quality_trimmed_mt3_forward_paired.fq Approx 90% complete for quality_trimmed_mt3_reverse_paired.fq Approx 90% complete for quality_trimmed_mt3_forward_paired.fq Approx 95% complete for quality_trimmed_mt3_reverse_paired.fq Approx 95% complete for quality_trimmed_mt3_forward_paired.fq Analysis complete for quality_trimmed_mt3_reverse_paired.fq Analysis complete for quality_trimmed_mt3_forward_paired.fq MissingOutputException in line 57 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Preprocess/Trimmomatic/quality_trimmed_mtname_forward_paired.fq output/Preprocess/Trimmomatic/quality_trimmed_mtname_reverse_paired.fq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message
(mosca2) zyshen@gpz:~/work/MOSCA$ cat experiments.tsv Files Sample Data type Condition Name /home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R2.fastq sample mrna MT mtname /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq sample mrna MT3 mtnamennn
Yes, I see the problem, and it is not on your side. MOSCA 1.2.1 still doesn't handle correctly the input of Name, instead it wants to determine it automatically. So you should leave that field blank. To the end of this week I am releasing a new version that will handle that correctly.
Also, I see that you are attempting to use only mRNA with MOSCA. While this is definitely something I will experiment with, it might be some weeks before MOSCA is capable of that (it is not hard to implement but the state of the art is not abundant on such workflow). So you could submit these datasets anyway, just for the preprocessing which will clean your data, and then follow the workflow I suggested above to obtain your readcounts, from the "quality_trimmed" datasets.
hi iquasere, Did you mean leave the last column "Name" field blank in the experiment.tsv? I also find that the pipeline couldn't run all the paired-end data which specified the path in the experimet.tsv. it only random choose two of them to start the pipeline. I mean that only see two block of "rule preprocess:......"
[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15
[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15
lacking to handle mt2 data Hope your next version can handle this issue. thanks
best, zhiyong
hi, another test. Just running several hours and then exit again.
mosca.py -c config.json & [1] 158870 (mosca2) zyshen@gpz:~/work/MOSCA$ Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 3 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 15 Select jobs to execute...
[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15
[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15 ........ Analysis complete for quality_trimmed_mt3_forward_paired.fq Analysis complete for quality_trimmed_mt3_reverse_paired.fq MissingOutputException in line 57 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message bash /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/unmerge-paired-reads.sh output/Preprocess/SortMeRNA/mt2_interleaved.fastq output/Preprocess/SortMeRNA/mt2_forward.fastq output/Preprocess/SortMeRNA/mt2_reverse.fastq Processing output/Preprocess/SortMeRNA/mt2_forward.fastq .. Processing output/Preprocess/SortMeRNA/mt2_reverse.fastq .. Done. fastqc --outdir output/Preprocess/FastQC --threads 15 --extract output/Preprocess/SortMeRNA/mt2_forward.fastq output/Preprocess/SortMeRNA/mt2_reverse.fastq Started analysis of mt2_forward.fastq Started analysis of mt2_reverse.fastq .................................. Approx 100% complete for quality_trimmed_mt2_forward_paired.fq Analysis complete for quality_trimmed_mt2_forward_paired.fq Approx 100% complete for quality_trimmed_mt2_reverse_paired.fq Analysis complete for quality_trimmed_mt2_reverse_paired.fq [Fri Jan 15 12:52:23 2021] Finished job 10. 3 of 15 steps (20%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zyshen/work/MOSCA/.snakemake/log/2021-01-15T101002.355269.snakemake.log
cat /home/zyshen/work/MOSCA/.snakemake/log/2021-01-15T101002.355269.snakemake.log Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 3 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 15 Select jobs to execute...
[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15
[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15
[Fri Jan 15 10:31:59 2021] Finished job 3. 1 of 15 steps (7%) done Select jobs to execute...
[Fri Jan 15 10:31:59 2021] rule join_reads: input: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq output: output/Assembly/sample_forward.fastq, output/Assembly/sample_reverse.fastq jobid: 2 wildcards: sample=sample
[Fri Jan 15 10:31:59 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mt2_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mt2_reverse_paired.fq jobid: 10 wildcards: name=mt2 threads: 15
[Fri Jan 15 10:32:12 2021] Finished job 2. 2 of 15 steps (13%) done Select jobs to execute... Failed to solve scheduling problem with ILP solver. Falling back to greedy solver.Run Snakemake with --verbose to see the full solver output for debugging the problem. [Fri Jan 15 12:52:23 2021] Finished job 10. 3 of 15 steps (20%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zyshen/work/MOSCA/.snakemake/log/2021-01-15T101002.355269.snakemake.log
zhiyong
In that output it lists the 3 preprocessing jobs. However, it failed with that weird Failed to solve scheduling problem with ILP solver. Falling back to greedy solver.Run Snakemake with --verbose to see the full solver output for debugging the problem.
, I never encountered that error when testing MOSCA. Not gonna test on my part because this next version will soon come out, but if it persists please let me know. In the meatime, could you try running
snakemake -S /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile -c config.json --printshellcmds --cores 40 --verbose --unlock
and posting the output here?
hi iquasere, It seems I can't run it a success due to the following error.
(mosca2) zyshen@gpz:~/work/MOSCA$ snakemake -S /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile -c config.json --printshellcmds --cores 40 --verbose --unlock
Full Traceback (most recent call last):
File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/init.py", line 594, in snakemake
snakefile, overwrite_first_rule=True, print_compilation=print_compilation
File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/workflow.py", line 1104, in include
exec(compile(code, snakefile, "exec"), self.globals)
File "/home/zyshen/work/MOSCA/workflow/Snakefile", line 7, in
KeyError in line 7 of /home/zyshen/work/MOSCA/workflow/Snakefile:
'experiments'
File "/home/zyshen/work/MOSCA/workflow/Snakefile", line 7, in
Then I still test it in this command again. Let's wait for it several hours later.
mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json & [1] 49845 (mosca2) zyshen@gpz:~/work/MOSCA$ Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 3 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 15 Select jobs to execute...
[Sat Jan 16 12:17:59 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15
[Sat Jan 16 12:17:59 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15
Job counts: count jobs 1 preprocess 1 Job counts: count jobs 1 preprocess 1 fastqc --outdir output/Preprocess/FastQC --threads 15 --extract /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq fastqc --outdir output/Preprocess/FastQC --threads 15 --extract /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq Started analysis of mg1_R1.fastq .......
best, zhiyong
hi iquasere, This time I only to run one MT and one MG data, and it almost running most of steps than before. pls check the nohup.txt file in the attachment. Thanks. Hope your new version can come soon. nohup.zip
zhiyong
Dear iquasere, Any comments or solution for the above error? What's the status of your new version of MOSCA. I'm expecting to use it in my study as soon. Thanks!
regards, zhiyong
This next version will already allow that name customization. It will likely be released today. Only one more problem to debug ^^ On another note, and following on the "Possible to run MT without MG" question, after this 1.2.3 version, a 1.3 MOSCA is gonna likely be able to do that, as I realized it is the same workflow as running MG without assembly, which also something still in need of some adjustments, but which will be available soon
Cool, Thanks :)
Unfortunately, some problem on the CI of Bioconda is stopping MOSCA's new version from becoming available. If you want to circunvent this, you will need to compile MOSCA from source code. Let me know if you need it this early, and I will provide the commands modified to install MOSCA without Bioconda
Hi iquasere, Yes, i want to need it early. Pls provide the commands modified to let me compile MOSCA from the source code. Thanks :)
best, zhiyong
MOSCA 1.2.2 ended up being released, really an oversight on my part. Do note that in this next version it allows for more parameters in preprocessing (because I tested it with some very weird ancient datasets, that would require that tweaking). You can obtain the new configuration through MOSGUITO, or use this one. Please, if this new version either fails or succeeds, do inform me, as all tests were successfull and I do not understand what may be failling there.
On another note, in this version I have stopped the last step - using KEGGCharter. This is because an update on KEGG Pathway has caused Biopython to lose some functionalities, and made KEGGCharter workflow not work. However, in the next version of KEGGCharter I am going to use different methods of Biopython that won't be hurt by this kind of updates.
Hi iquasere, Really thanks for your work. It seems fine now after I install some missing packages which error report. Right now it works normally and i will let you know wien the whole pipeline done. BTW, For the KEGGCharter part. How I can get the last step result? Run it alone based on the results? thanks
[Wed Jan 27 18:25:57 2021] Finished job 12. 1 of 11 steps (9%) done 0:28:23.175 8G / 22G INFO General (main.cpp : 167) Clustering done. Total clusters: 294344279 0:28:23.366 5G / 22G INFO K-mer Counting (kmer_data.cpp : 371) Collecting K-mer information, this takes a while. 0:28:27.823 13G / 22G INFO K-mer Counting (kmer_data.cpp : 377) Processing /media/zyshen/work/MOSCA/MOSCA-1.2.2/output/Preprocess/Sample_forward.fastq 0:31:05.578 12G / 22G INFO K-mer Counting (kmer_data.cpp : 377) Processing /media/zyshen/work/MOSCA/MOSCA-1.2.2/output/Preprocess/Sample_reverse.fastq 0:33:43.017 12G / 22G INFO K-mer Counting (kmer_data.cpp : 384) Collection done, postprocessing. 0:33:45.031 12G / 22G INFO K-mer Counting (kmer_data.cpp : 398) There are 354532704 kmers in total. Among them 428420 (0.120841%) are singletons. 0:33:45.032 12G / 22G INFO General (main.cpp : 173) Subclustering Hamming graph ....
best, zhiyong
Oh man, so glad to hear that ahah. About KEGGCharter, I am working on it at the moment. This next version will be much faster, since it will retrieve the KGMLs and store and work upon them locally, but I also wanted it to chart information in a multithreaded manner, which is taking its challenges. I will try to release a new version just with this new local feature for now, and make it use multithread in future versions.
After such new version is available, I will give you the command to run it directly on the results of MOSCA, as your version of MOSCA will still not request KEGGCharter to run
Hi iquasere, Really Thanks! your great work help me to save a lot of time for my coming huge mg and mt data. Here the pipeline stop again and I can't fix it by installing the missing package. I paste the last error report as follows, it seems the htseq-count didn't has the -c and -n parameter. I'm not sure if i use the different version with yours. And i check the command help of htseq-count and really missing these options. Any suggestion? thanks
Finished: 2021-01-28 09:55:59 Elapsed time: 0:24:15.914556 Total NOTICEs: 38; WARNINGs: 1; non-fatal ERRORs: 0
Thank you for using QUAST!
INDEX was located at output/Assembly/Sample/contigs_index
output/Assembly/Sample/quality_control/alignment.log was found!
GFF file was located at output/Assembly/Sample/contigs.gff
htseq-count -i gene_id -c output/Assembly/Sample/quality_control/alignment.readcounts -n 14 output/Assembly/Sample/quality_control/alignment.sam output/Assembly/Sample/contigs.gff --stranded=no
usage: htseq-count [options] alignment_file gff_file
htseq-count: error: unrecognized arguments: -c -n output/Assembly/Sample/quality_control/alignment.sam output/Assembly/Sample/contigs.gff
Traceback (most recent call last):
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/assembly.py", line 97, in
RuleException: CalledProcessError in line 108 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/assembly.py -r output/Preprocess/Sample_forward.fastq,output/Preprocess/Sample_reverse.fastq -t 14 -o output/Assembly/Sample -a metaspades -m 40960' returned non-zero exit status 1. File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init.py", line 2189, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 108, in rule_assembly File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init.py", line 529, in _callback File "/media/zyshen/miniconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init.py", line 515, in cached_or_run File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init__.py", line 2201, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-01-28T074852.508887.snakemake.log
regards, zhiyong
I am very sorry about that, htseq-count was updated and left that option out. Experiment running the version I have of htseq-count: conda install -c conda-forge -c bioconda htseq=0.12.4
I believe MOSCA will run all the assembly step again, but preprocessing should not be repeated. In the next version of MOSCA I will strict htseq-count version to 0.12.4
only
Thank you for your so quickly reply, At this time, the pipeline going to this step and stop again.
========== Elapsed Time ========== 0 hours 13 minutes and 49 seconds.
checkm lineage_wf -x fasta -r --ali --nt -t 14 --pplacer_threads 14 output/Binning/Sample output/Binning/Sample --tab_table --file output/Binning/Sample/checkm.tsv
File "/media/zyshen/miniconda3/envs/snakemake/bin/checkm", line 107
print "%s removed" % (filename)
^
SyntaxError: invalid syntax
Traceback (most recent call last):
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/binning.py", line 91, in
This is more weird, as CheckM should be compatible to the last version, but it seems you have installed the Python 2 CheckM (it only became Python 3 compatible a few months ago). Can you share what CheckM version you have? (by running just checkm
)
About downloading UniProt, MOSCA shouldn't be downloading it if uniprot.fasta
is already present at the folder specified with diamond_database
, nor if that option was set as you described. The latter was my miss, as when I tested it worked ok because I never changed uniprot.fasta
out of its original place, and so I oversighted it. Next versions of MOSCA will have this fixed. For now, if you want to avoid downloading all that again, you need to have a uniprot.fasta
file in the directory of the database set with diamond_database
:/
MOSCA was tested with CheckM 1.1.2. So running conda install -c conda-forge -c bioconda checkm-genome
should fix that!
HI iquasere, Thanks, the Checkm problem had been resolved based on your command. the other problem I encount is missing the command of recognizer.py. For the upimapi.py, I resolved it by install the corresponding package. For recognizer.py, I didn't know where to download. Thanks Is it right to download it from here? https://github.com/fxia22/pocketshpinx/blob/master/nodes/recognizer.py However, I still meet another problem which said "rospkg.common.ResourceNotFound: pocketsphinx“ How to resolve them. Thanks
python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/quantification_analyser.py -e output/experiments.tsv -t 14 -o output -if tsv recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces /bin/bash: recognizer.py: command not found upimapi.py -i output/Annotation/Sample/aligned.blast -o output/Annotation/uniprotinfo --blast --full-id /bin/bash: upimapi.py: command not found [Fri Jan 29 15:04:08 2021] [Fri Jan 29 15:04:08 2021]
hi iquasere, conda install -c bioconda recognizer it works now after this command. thanks zhiyong
I download a file ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl.gz to put in the right directory.
is it right?
2021-01-29 09:28:33: Running annotation with RPS-BLAST and KOG database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_12 -out output/Annotation/Sample/KOG_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
Traceback (most recent call last):
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 462, in
RuleException: CalledProcessError in line 175 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 175, in rule_recognizer File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message
Hi, Another problem now after i download the hmm_PGAP.tsv and cddid_all.tbl. wget https://ftp.ncbi.nlm.nih.gov/hmm/3.0/hmm_PGAP.tsv
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 09:33:34: Organizing annotation results
[1/8] Handling CDD identifications
Traceback (most recent call last):
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 462, in
RuleException: CalledProcessError in line 175 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 175, in rule_recognizer File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message zhiyong
hi iquasere, conda install -c bioconda recognizer it works now after this command. thanks zhiyong
Yes, this is the right command ^^
As for obtaining the databases with reCOGnizer, it automatically downloads all of them - MOSCA was designed to obtain everything by itself, what the tools don't obtain automatically, MOSCA will get them. Can you put here the entire output of this command?
recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces
On another note, if the problem is with reCOGnizer, I am also its developer, so solutions should come in fast ^^ Although it seems MOSCA is already giving you more problems than putting the commands by hand xD
When MOSCA's workflow finishes, you can get the remaining results with KEGGCharter by installing it with conda install -c conda-forge -c bioconda keggcharter=0.1.3
and running
kegg_charter.py -f output/MOSCA_Entry_Report.xlsx -gcol [comma-separated list of MG columns] -tcol [comma-separated list of MT columns] -keggc "Cross-reference (KEGG)" -o output/KEGGCharter_results -tc "Taxonomic lineage (GENUS)"
recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces
/media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0.aux not found!
Some part of CDD was not valid!
Generating databases for [13] threads.
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7
INPUT ERROR: Input file contains no smp filnames
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10
makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_12.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_12 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_1312
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
INPUT ERROR: Input file contains no smp filnames
....
sed -i -e 's/ //g' output/Annotation/Sample/fgs.faa
2021-01-29 14:27:24: Running annotation with RPS-BLAST and CDD database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_12 -out output/Annotation/Sample/CDD_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:24: Running annotation with RPS-BLAST and Pfam database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_12 -out output/Annotation/Sample/Pfam_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:24: Running annotation with RPS-BLAST and NCBIfam database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_12 -out output/Annotation/Sample/NCBIfam_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:24: Running annotation with RPS-BLAST and Protein_Clusters database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_12 -out output/Annotation/Sample/Protein_Clusters_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:24: Running annotation with RPS-BLAST and Smart database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_12 -out output/Annotation/Sample/Smart_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:25: Running annotation with RPS-BLAST and TIGRFAM database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_12 -out output/Annotation/Sample/TIGRFAM_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:25: Running annotation with RPS-BLAST and COG database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_12 -out output/Annotation/Sample/COG_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:25: Running annotation with RPS-BLAST and KOG database as reference.
rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_12 -out output/Annotation/Sample/KOG_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1
BLAST engine error: Cannot retrieve path to RPS database
2021-01-29 14:27:27: Organizing annotation results
[1/8] Handling CDD identifications
Traceback (most recent call last):
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 462, in
Sorry, please run
recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces --download-resources
and then re-run MOSCA's command.
Notice the --download-resources
at the end. reCOGnizer got an update, it now sets as default to not download CDD and all other resources. I'll fix this by including that --download-resources
parameter on reCOGnizer's command. This will only be required once, so next times you run MOSCA or reCOGnizer, this option won't be needed, and it will do that step just fine.
Thanks iquasere, It seems work now and will let you know when done :)
Hi,
Thanks for the recent overhaul to the software. I was wondering if it is possible to run MOSCA only on MT data without MG data? I have some MT data simulated with Polyester and art_illumna (which only simulate RNA-seq data) with no associated MG reads and I was wondering if it's possible to do DE with it.
Thanks.