iquasere / MOSCA

Meta-Omics Software for Community Analysis
GNU General Public License v3.0
35 stars 4 forks source link

Possible to run MT without MG? #3

Closed bwu62 closed 3 years ago

bwu62 commented 5 years ago

Hi,

Thanks for the recent overhaul to the software. I was wondering if it is possible to run MOSCA only on MT data without MG data? I have some MT data simulated with Polyester and art_illumna (which only simulate RNA-seq data) with no associated MG reads and I was wondering if it's possible to do DE with it.

Thanks.

iquasere commented 5 years ago

This is something I have done without MOSCA, and using grinder and polyester data as well. The workflow for this analysis would involve the preprocessing and then gene calling on the MT data directly. From there, the remaining annotation steps could happen as normal, and differential expression analysis would consider as read count the number of times each gene was called. It worked very well with simulated data from grinder, and similar results were obtained as those using MG data. I didn't thought there would be interest in such a workflow from a practical point of view, but it can easily be integrated. I will work to implement this after fixing the problems that the tool currently has. One question: did you manage to simulate FastQ reads with polyester? I only managed to simulate FASTA files with it.

bwu62 commented 5 years ago

I used polyester to simulate fasta files, then I used art_illumina to simulate the fastq reads. I have been trying to use these simulated fastq reads to run, but the annotation step runs into the error I described.

Do you have code from when you ran it without MOSCA just manually? I would really appreciate if you could send it. Thank you

Best, Bi Cheng


From: João Sequeira notifications@github.com Sent: Monday, April 22, 2019 5:58:15 PM To: iquasere/MOSCA Cc: Bi Cheng Wu; Author Subject: Re: [iquasere/MOSCA] Possible to run MT without MG? (#3)

This is something I have done without MOSCA, and using grinder https://github.com/zyxue/biogrinder and polyester data as well. The workflow for this analysis would involve the preprocessing and then gene calling on the MT data directly. From there, the remaining annotation steps could happen as normal, and differential expression analysis would consider as read count the number of times each gene was called. It worked very well with simulated data from grinder, and similar results were obtained as those using MG data. I didn't throught there would be interest in such a workflow from a practical point of view, but it can easily be integrated. I will work to implement this after fixing the problems that the tool currently has. One question: did you manage to simulate FastQ reads with polyester? I only managed to simulate FASTA files with it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/iquasere/MOSCA/issues/3#issuecomment-485581347, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHZ5JZEQPUSI53FWWSUMFODPRY7IPANCNFSM4HHKRP6Q.

iquasere commented 5 years ago
  1. Preprocess your data

  2. If you have two or more files of data, merge them in one file

    cat file1 file2 > mt_reads.fastq
  3. Convert the fastq file to fasta (FragGeneScan only accepts fasta input)

    paste - - - - < mt_reads.fastq | cut -f 1,2 | sed 's/^@/>/' | tr \"\t" "\n\" >  mt_reads.fasta
  4. Perform gene calling on the MT fasta reads.

    run_FragGeneScan.pl -genome=mt_reads.fasta -out=fgs -complete=0 -train=./error_model

    Error model is specific to the sequencing you performed, in MOSCA the default is "illumina_10", but FragGeneScan help message explains each file.

    [train_file_name]:  file name that contains model parameters; this file should be in the "train" directory
                   Note that four files containing model parameters already exist in the "train" directory
                   [complete] for complete genomic sequences or short sequence reads without sequencing error
                   [sanger_5] for Sanger sequencing reads with about 0.5% error rate
                   [sanger_10] for Sanger sequencing reads with about 1% error rate
                   [454_10] for 454 pyrosequencing reads with about 1% error rate
                   [454_30] for 454 pyrosequencing reads with about 3% error rate
                   [illumina_5] for Illumina sequencing reads with about 0.5% error rate
                   [illumina_10] for Illumina sequencing reads with about 1% error rate
  5. Annotate the ORFs obtained with DIAMOND

    diamond blastp --db database.dmnd --out aligned.blast --query fgs.faa --max-target-seqs 1
  6. You end up with an annotation file where the first column is the name of the ORF and the second is the identification of that protein. If you sum the occurrences, you will have an approximation of the relative presence of proteins in that sample. After counting the occurrences of every protein in each sample,, you end up with an expression matrix that can undergo differential expression analysis!

bwu62 commented 5 years ago

Thank you! I will try this and see what I get.

Best,

Bi


From: João Sequeira notifications@github.com Sent: Tuesday, April 23, 2019 8:43:59 AM To: iquasere/MOSCA Cc: Bi Cheng Wu; Author Subject: Re: [iquasere/MOSCA] Possible to run MT without MG? (#3)

  1. Preprocess your data

  2. If you have two or more files of data, merge them in one file ´´´ cat file1 file2 > mt_reads.fastq ´´´

  3. Convert the fastq file to fasta (FragGeneScan only accepts fasta input) ´´´ paste - - - - < mt_reads.fastq | cut -f 1,2 | sed 's/^@/>/' | tr "\t" "\n" > mt_reads.fasta ´´´

  4. Perform gene calling on the MT fasta reads. Error model is specific to the sequencing you performed, in MOSCA the default is "illumina_10", but FragGeneScan help message explains each file. ´´´ run_FragGeneScan.pl -genome=mt_reads.fasta -out=fgs -complete=0 -train=./error_model ´´´

  5. Annotate the ORFs obtained with DIAMOND ´´´ diamond blastp --db database.dmnd --out aligned.blast --query fgs.faa --max-target-seqs 1 ´´´

  6. You end up with an annotation file where the first column is the name of the ORF and the second is the identification of that protein. If you sum the occurrences, you will have an approximation of the relative presence of proteins in that sample. After counting the occurrences of every protein in each sample,, you end up with an expression matrix that can undergo differential expression analysis!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/iquasere/MOSCA/issues/3#issuecomment-485808606, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHZ5JZEXOLNJJPDFUVHBCPTPR4HB7ANCNFSM4HHKRP6Q.

szypanther commented 3 years ago

Hi, I follow your tutorial and use the commandline below to finish the install. conda create -n mosca -c conda-forge -c bioconda -c anaconda mosca=1.2.1 however, I encounter such error report when it with my data. Could you help me to resolve the problem? Thanks.
(mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json KeyError in line 15 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: 'Name' File "/home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile", line 15, in File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/pandas/core/series.py", line

best, zhiyong

szypanther commented 3 years ago

(mosca2) zyshen@gpz:~/work/MOSCA$ cat experiments.tsv ,Files,Sample,Data type,Condition,Name 0,"/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610695-1a_1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610695-1a_2.fastq",Sample,mrna,c1,cancer1 1,"/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610696-1a_1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/20201023_L_QMK_FKDL202610696-1a_2.fastq",Sample,mrna,c1,cancer2

szypanther commented 3 years ago

Hi, if i use experiment.xlsx file and encounter another problem. (mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json AttributeError in line 16 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: 'float' object has no attribute 'split' File "/home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile", line 16, in

iquasere commented 3 years ago

Greetings! I see your first experiments file is in CSV format, MOSCA only accepts in TSV or EXCEL format. If you obtain your experiments file from MOSGUITO it should come in the right format!

szypanther commented 3 years ago

Hi iquasere, Thanks for your quikly reply, I follow your guide and encounter another problem again.

(mosca2) zyshen@gpz:~/work/MOSCA$ head experiments.tsv Files Sample Data type Condition Name /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mg_R2.fastq sample dna MG mgname /home/zyshen/work/MOSCA/20201023_L_QMK/mt1_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R2.fastq sample mrna MT mtname

(mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json Building DAG of jobs... InputFunctionException in line 56 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Error: IndexError: single positional indexer is out-of-bounds Wildcards: name=mg Traceback: File "/home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile", line 36, in preprocess_input File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/pandas/core/indexing.py", line 894, in getitem File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1500, in _getitem_axis File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/pandas/core/indexing.py", line 1443, in _validate_integer

any suggestion? Thanks. zhiyong

szypanther commented 3 years ago

Hi iquasere, Is there any requirement for the fastq file's format? I find that one of my paired data can run it for some time and stop when encounter another error.

(mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 1 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 13 Select jobs to execute...

[Thu Jan 14 16:11:46 2021] rule join_reads: output: output/Assembly/sample_forward.fastq, output/Assembly/sample_reverse.fastq jobid: 2 wildcards: sample=sample

[Thu Jan 14 16:11:46 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtname_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtname_reverse_paired.fq jobid: 9 wildcards: name=mtname threads: 15

Job counts: count jobs 1 join_reads 1 Job counts: count jobs 1 preprocess 1 fastqc --outdir output/Preprocess/FastQC --threads 15 --extract /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R1.fastq /home/zyshen/work/MOSCA/20201023_L_QMK/mg_R2.fastq Started analysis of mg_R1.fastq Started analysis of mg_R2.fastq MissingOutputException in line 70 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Assembly/sample_forward.fastq output/Assembly/sample_reverse.fastq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message ................................................................... Approx 95% complete for quality_trimmed_mt1_reverse_paired.fq Analysis complete for quality_trimmed_mt1_forward_paired.fq Analysis complete for quality_trimmed_mt1_reverse_paired.fq MissingOutputException in line 57 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Preprocess/Trimmomatic/quality_trimmed_mtname_forward_paired.fq output/Preprocess/Trimmomatic/quality_trimmed_mtname_reverse_paired.fq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zyshen/work/MOSCA/.snakemake/log/2021-01-14T144311.510674.snakemake.log

thanks zhiyong

iquasere commented 3 years ago

That is very odd, it stopped right after the first FastQC check. Can you send me your experiments TSV file?

szypanther commented 3 years ago

hi iquasere, This file it works, but the pipeline stop after trimmed work. Approx 85% complete for quality_trimmed_mt3_reverse_paired.fq Approx 85% complete for quality_trimmed_mt3_forward_paired.fq Approx 90% complete for quality_trimmed_mt3_reverse_paired.fq Approx 90% complete for quality_trimmed_mt3_forward_paired.fq Approx 95% complete for quality_trimmed_mt3_reverse_paired.fq Approx 95% complete for quality_trimmed_mt3_forward_paired.fq Analysis complete for quality_trimmed_mt3_reverse_paired.fq Analysis complete for quality_trimmed_mt3_forward_paired.fq MissingOutputException in line 57 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Preprocess/Trimmomatic/quality_trimmed_mtname_forward_paired.fq output/Preprocess/Trimmomatic/quality_trimmed_mtname_reverse_paired.fq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message

(mosca2) zyshen@gpz:~/work/MOSCA$ cat experiments.tsv Files Sample Data type Condition Name /home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R2.fastq sample mrna MT mtname /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq,/home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq sample mrna MT3 mtnamennn

experiments.zip

iquasere commented 3 years ago

Yes, I see the problem, and it is not on your side. MOSCA 1.2.1 still doesn't handle correctly the input of Name, instead it wants to determine it automatically. So you should leave that field blank. To the end of this week I am releasing a new version that will handle that correctly.

Also, I see that you are attempting to use only mRNA with MOSCA. While this is definitely something I will experiment with, it might be some weeks before MOSCA is capable of that (it is not hard to implement but the state of the art is not abundant on such workflow). So you could submit these datasets anyway, just for the preprocessing which will clean your data, and then follow the workflow I suggested above to obtain your readcounts, from the "quality_trimmed" datasets.

szypanther commented 3 years ago

hi iquasere, Did you mean leave the last column "Name" field blank in the experiment.tsv? I also find that the pipeline couldn't run all the paired-end data which specified the path in the experimet.tsv. it only random choose two of them to start the pipeline. I mean that only see two block of "rule preprocess:......"

[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15

[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15

lacking to handle mt2 data Hope your next version can handle this issue. thanks

best, zhiyong

szypanther commented 3 years ago

hi, another test. Just running several hours and then exit again.

mosca.py -c config.json & [1] 158870 (mosca2) zyshen@gpz:~/work/MOSCA$ Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 3 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 15 Select jobs to execute...

[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15

[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15 ........ Analysis complete for quality_trimmed_mt3_forward_paired.fq Analysis complete for quality_trimmed_mt3_reverse_paired.fq MissingOutputException in line 57 of /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile: Job Missing files after 5 seconds: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 0 completed successfully, but some output files are missing. 0 File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 581, in handle_job_success File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Exiting because a job execution failed. Look above for error message bash /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/unmerge-paired-reads.sh output/Preprocess/SortMeRNA/mt2_interleaved.fastq output/Preprocess/SortMeRNA/mt2_forward.fastq output/Preprocess/SortMeRNA/mt2_reverse.fastq Processing output/Preprocess/SortMeRNA/mt2_forward.fastq .. Processing output/Preprocess/SortMeRNA/mt2_reverse.fastq .. Done. fastqc --outdir output/Preprocess/FastQC --threads 15 --extract output/Preprocess/SortMeRNA/mt2_forward.fastq output/Preprocess/SortMeRNA/mt2_reverse.fastq Started analysis of mt2_forward.fastq Started analysis of mt2_reverse.fastq .................................. Approx 100% complete for quality_trimmed_mt2_forward_paired.fq Analysis complete for quality_trimmed_mt2_forward_paired.fq Approx 100% complete for quality_trimmed_mt2_reverse_paired.fq Analysis complete for quality_trimmed_mt2_reverse_paired.fq [Fri Jan 15 12:52:23 2021] Finished job 10. 3 of 15 steps (20%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zyshen/work/MOSCA/.snakemake/log/2021-01-15T101002.355269.snakemake.log

cat /home/zyshen/work/MOSCA/.snakemake/log/2021-01-15T101002.355269.snakemake.log Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 3 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 15 Select jobs to execute...

[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15

[Fri Jan 15 10:10:03 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15

[Fri Jan 15 10:31:59 2021] Finished job 3. 1 of 15 steps (7%) done Select jobs to execute...

[Fri Jan 15 10:31:59 2021] rule join_reads: input: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq output: output/Assembly/sample_forward.fastq, output/Assembly/sample_reverse.fastq jobid: 2 wildcards: sample=sample

[Fri Jan 15 10:31:59 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt2_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mt2_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mt2_reverse_paired.fq jobid: 10 wildcards: name=mt2 threads: 15

[Fri Jan 15 10:32:12 2021] Finished job 2. 2 of 15 steps (13%) done Select jobs to execute... Failed to solve scheduling problem with ILP solver. Falling back to greedy solver.Run Snakemake with --verbose to see the full solver output for debugging the problem. [Fri Jan 15 12:52:23 2021] Finished job 10. 3 of 15 steps (20%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/zyshen/work/MOSCA/.snakemake/log/2021-01-15T101002.355269.snakemake.log

zhiyong

iquasere commented 3 years ago

In that output it lists the 3 preprocessing jobs. However, it failed with that weird Failed to solve scheduling problem with ILP solver. Falling back to greedy solver.Run Snakemake with --verbose to see the full solver output for debugging the problem., I never encountered that error when testing MOSCA. Not gonna test on my part because this next version will soon come out, but if it persists please let me know. In the meatime, could you try running

snakemake -S /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile -c config.json --printshellcmds --cores 40 --verbose --unlock

and posting the output here?

szypanther commented 3 years ago

hi iquasere, It seems I can't run it a success due to the following error.

(mosca2) zyshen@gpz:~/work/MOSCA$ snakemake -S /home/zyshen/anaconda3new/envs/mosca2/share/MOSCA/scripts/Snakefile -c config.json --printshellcmds --cores 40 --verbose --unlock Full Traceback (most recent call last): File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/init.py", line 594, in snakemake snakefile, overwrite_first_rule=True, print_compilation=print_compilation File "/home/zyshen/anaconda3new/envs/mosca2/lib/python3.7/site-packages/snakemake/workflow.py", line 1104, in include exec(compile(code, snakefile, "exec"), self.globals) File "/home/zyshen/work/MOSCA/workflow/Snakefile", line 7, in if config["experiments"].endswith('.xlsx'): KeyError: 'experiments'

KeyError in line 7 of /home/zyshen/work/MOSCA/workflow/Snakefile: 'experiments' File "/home/zyshen/work/MOSCA/workflow/Snakefile", line 7, in

Then I still test it in this command again. Let's wait for it several hours later.

mosca2) zyshen@gpz:~/work/MOSCA$ mosca.py -c config.json & [1] 49845 (mosca2) zyshen@gpz:~/work/MOSCA$ Building DAG of jobs... Using shell: /bin/bash Provided cores: 40 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 annotation 1 assembly 1 binning 1 differential_expression 1 join_information 1 join_reads 1 keggcharter 3 preprocess 1 quantification_analysis 1 recognizer 1 report 1 upimapi 15 Select jobs to execute...

[Sat Jan 16 12:17:59 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mtnamennn_reverse_paired.fq jobid: 11 wildcards: name=mtnamennn threads: 15

[Sat Jan 16 12:17:59 2021] rule preprocess: input: /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq, /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq output: output/Preprocess/Trimmomatic/quality_trimmed_mg1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mg1_reverse_paired.fq jobid: 3 wildcards: name=mg1 threads: 15

Job counts: count jobs 1 preprocess 1 Job counts: count jobs 1 preprocess 1 fastqc --outdir output/Preprocess/FastQC --threads 15 --extract /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R1.fastq /home/zyshen/work/MOSCA/20201023_L_QMK/mg1_R2.fastq fastqc --outdir output/Preprocess/FastQC --threads 15 --extract /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R1.fastq /home/zyshen/work/MOSCA/20201023_L_QMK/mt3_R2.fastq Started analysis of mg1_R1.fastq .......

best, zhiyong

szypanther commented 3 years ago

hi iquasere, This time I only to run one MT and one MG data, and it almost running most of steps than before. pls check the nohup.txt file in the attachment. Thanks. Hope your new version can come soon. nohup.zip

zhiyong

szypanther commented 3 years ago

Dear iquasere, Any comments or solution for the above error? What's the status of your new version of MOSCA. I'm expecting to use it in my study as soon. Thanks!

regards, zhiyong

iquasere commented 3 years ago

This next version will already allow that name customization. It will likely be released today. Only one more problem to debug ^^ On another note, and following on the "Possible to run MT without MG" question, after this 1.2.3 version, a 1.3 MOSCA is gonna likely be able to do that, as I realized it is the same workflow as running MG without assembly, which also something still in need of some adjustments, but which will be available soon

szypanther commented 3 years ago

Cool, Thanks :)

iquasere commented 3 years ago

Unfortunately, some problem on the CI of Bioconda is stopping MOSCA's new version from becoming available. If you want to circunvent this, you will need to compile MOSCA from source code. Let me know if you need it this early, and I will provide the commands modified to install MOSCA without Bioconda

szypanther commented 3 years ago

Hi iquasere, Yes, i want to need it early. Pls provide the commands modified to let me compile MOSCA from the source code. Thanks :)

best, zhiyong

iquasere commented 3 years ago

MOSCA 1.2.2 ended up being released, really an oversight on my part. Do note that in this next version it allows for more parameters in preprocessing (because I tested it with some very weird ancient datasets, that would require that tweaking). You can obtain the new configuration through MOSGUITO, or use this one. Please, if this new version either fails or succeeds, do inform me, as all tests were successfull and I do not understand what may be failling there.

On another note, in this version I have stopped the last step - using KEGGCharter. This is because an update on KEGG Pathway has caused Biopython to lose some functionalities, and made KEGGCharter workflow not work. However, in the next version of KEGGCharter I am going to use different methods of Biopython that won't be hurt by this kind of updates.

szypanther commented 3 years ago

Hi iquasere, Really thanks for your work. It seems fine now after I install some missing packages which error report. Right now it works normally and i will let you know wien the whole pipeline done. BTW, For the KEGGCharter part. How I can get the last step result? Run it alone based on the results? thanks

[Wed Jan 27 18:25:57 2021] Finished job 12. 1 of 11 steps (9%) done 0:28:23.175 8G / 22G INFO General (main.cpp : 167) Clustering done. Total clusters: 294344279 0:28:23.366 5G / 22G INFO K-mer Counting (kmer_data.cpp : 371) Collecting K-mer information, this takes a while. 0:28:27.823 13G / 22G INFO K-mer Counting (kmer_data.cpp : 377) Processing /media/zyshen/work/MOSCA/MOSCA-1.2.2/output/Preprocess/Sample_forward.fastq 0:31:05.578 12G / 22G INFO K-mer Counting (kmer_data.cpp : 377) Processing /media/zyshen/work/MOSCA/MOSCA-1.2.2/output/Preprocess/Sample_reverse.fastq 0:33:43.017 12G / 22G INFO K-mer Counting (kmer_data.cpp : 384) Collection done, postprocessing. 0:33:45.031 12G / 22G INFO K-mer Counting (kmer_data.cpp : 398) There are 354532704 kmers in total. Among them 428420 (0.120841%) are singletons. 0:33:45.032 12G / 22G INFO General (main.cpp : 173) Subclustering Hamming graph ....

best, zhiyong

iquasere commented 3 years ago

Oh man, so glad to hear that ahah. About KEGGCharter, I am working on it at the moment. This next version will be much faster, since it will retrieve the KGMLs and store and work upon them locally, but I also wanted it to chart information in a multithreaded manner, which is taking its challenges. I will try to release a new version just with this new local feature for now, and make it use multithread in future versions.

After such new version is available, I will give you the command to run it directly on the results of MOSCA, as your version of MOSCA will still not request KEGGCharter to run

szypanther commented 3 years ago

Hi iquasere, Really Thanks! your great work help me to save a lot of time for my coming huge mg and mt data. Here the pipeline stop again and I can't fix it by installing the missing package. I paste the last error report as follows, it seems the htseq-count didn't has the -c and -n parameter. I'm not sure if i use the different version with yours. And i check the command help of htseq-count and really missing these options. Any suggestion? thanks

Finished: 2021-01-28 09:55:59 Elapsed time: 0:24:15.914556 Total NOTICEs: 38; WARNINGs: 1; non-fatal ERRORs: 0

Thank you for using QUAST! INDEX was located at output/Assembly/Sample/contigs_index output/Assembly/Sample/quality_control/alignment.log was found! GFF file was located at output/Assembly/Sample/contigs.gff htseq-count -i gene_id -c output/Assembly/Sample/quality_control/alignment.readcounts -n 14 output/Assembly/Sample/quality_control/alignment.sam output/Assembly/Sample/contigs.gff --stranded=no usage: htseq-count [options] alignment_file gff_file htseq-count: error: unrecognized arguments: -c -n output/Assembly/Sample/quality_control/alignment.sam output/Assembly/Sample/contigs.gff Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/assembly.py", line 97, in Assembler().run() File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/assembly.py", line 85, in run threads=args.threads) File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/mosca_tools.py", line 211, in perform_alignment ('' if blast is not None else ' --stranded=no'))) File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/mosca_tools.py", line 24, in run_command check=True) File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.5/subprocess.py", line 711, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['htseq-count', '-i', 'gene_id', '-c', 'output/Assembly/Sample/quality_control/alignment.readcounts', '-n', '14', 'output/Assembly/Sample/quality_control/alignment.sam', 'output/Assembly/Sample/contigs.gff', '--stranded=no']' returned non-zero exit status 2 [Thu Jan 28 09:56:00 2021] Error in rule assembly: jobid: 0 output: output/Assembly/Sample/contigs.fasta

RuleException: CalledProcessError in line 108 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/assembly.py -r output/Preprocess/Sample_forward.fastq,output/Preprocess/Sample_reverse.fastq -t 14 -o output/Assembly/Sample -a metaspades -m 40960' returned non-zero exit status 1. File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init.py", line 2189, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 108, in rule_assembly File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init.py", line 529, in _callback File "/media/zyshen/miniconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init.py", line 515, in cached_or_run File "/media/zyshen/miniconda3/lib/python3.7/site-packages/snakemake/executors/init__.py", line 2201, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-01-28T074852.508887.snakemake.log

regards, zhiyong

iquasere commented 3 years ago

I am very sorry about that, htseq-count was updated and left that option out. Experiment running the version I have of htseq-count: conda install -c conda-forge -c bioconda htseq=0.12.4 I believe MOSCA will run all the assembly step again, but preprocessing should not be repeated. In the next version of MOSCA I will strict htseq-count version to 0.12.4 only

szypanther commented 3 years ago

Thank you for your so quickly reply, At this time, the pipeline going to this step and stop again.

========== Elapsed Time ========== 0 hours 13 minutes and 49 seconds.

checkm lineage_wf -x fasta -r --ali --nt -t 14 --pplacer_threads 14 output/Binning/Sample output/Binning/Sample --tab_table --file output/Binning/Sample/checkm.tsv File "/media/zyshen/miniconda3/envs/snakemake/bin/checkm", line 107 print "%s removed" % (filename) ^ SyntaxError: invalid syntax Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/binning.py", line 91, in Binner().run() File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/binning.py", line 88, in run self.run_checkm(args.output, threads=args.threads) File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/binning.py", line 75, in run_checkm threads, bins_folder)) File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/mosca_tools.py", line 24, in run_command check=True) File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['checkm', 'lineage_wf', '-x', 'fasta', '-r', '--ali', '--nt', '-t', '14', '--pplacer_threads', '14', 'output/Binning/Sample', 'output/Binning/Sample', '--tab_table', '--file', 'output/Binning/Sample/checkm.tsv']' returned non-zero exit status 1. uniprot_trembl.fasta.gz 0%[ ] 57.78M 61.0KB/s eta 7d 11h [Thu Jan 28 22:34:59 2021] Error in rule binning: jobid: 0 output: output/Binning/Sample/checkm.tsv

iquasere commented 3 years ago

This is more weird, as CheckM should be compatible to the last version, but it seems you have installed the Python 2 CheckM (it only became Python 3 compatible a few months ago). Can you share what CheckM version you have? (by running just checkm)

About downloading UniProt, MOSCA shouldn't be downloading it if uniprot.fasta is already present at the folder specified with diamond_database, nor if that option was set as you described. The latter was my miss, as when I tested it worked ok because I never changed uniprot.fasta out of its original place, and so I oversighted it. Next versions of MOSCA will have this fixed. For now, if you want to avoid downloading all that again, you need to have a uniprot.fasta file in the directory of the database set with diamond_database :/

iquasere commented 3 years ago

MOSCA was tested with CheckM 1.1.2. So running conda install -c conda-forge -c bioconda checkm-genome should fix that!

szypanther commented 3 years ago

HI iquasere, Thanks, the Checkm problem had been resolved based on your command. the other problem I encount is missing the command of recognizer.py. For the upimapi.py, I resolved it by install the corresponding package. For recognizer.py, I didn't know where to download. Thanks Is it right to download it from here? https://github.com/fxia22/pocketshpinx/blob/master/nodes/recognizer.py However, I still meet another problem which said "rospkg.common.ResourceNotFound: pocketsphinx“ How to resolve them. Thanks

python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/quantification_analyser.py -e output/experiments.tsv -t 14 -o output -if tsv recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces /bin/bash: recognizer.py: command not found upimapi.py -i output/Annotation/Sample/aligned.blast -o output/Annotation/uniprotinfo --blast --full-id /bin/bash: upimapi.py: command not found [Fri Jan 29 15:04:08 2021] [Fri Jan 29 15:04:08 2021]

szypanther commented 3 years ago

hi iquasere, conda install -c bioconda recognizer it works now after this command. thanks zhiyong

szypanther commented 3 years ago

I download a file ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl.gz to put in the right directory. is it right? 2021-01-29 09:28:33: Running annotation with RPS-BLAST and KOG database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_12 -out output/Annotation/Sample/KOG_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database Traceback (most recent call last): File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 462, in main() File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 380, in main cddid = parse_cddid('{}/cddid_all.tbl'.format(args.resources_directory)) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 175, in parse_cddid cddid = pd.read_csv(cddid, sep='\t', header=None)[[0, 1, 3]] File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 452, in _read parser = TextFileReader(fp_or_buf, kwds) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 946, in init self._make_engine(self.engine) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 1178, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 2008, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '/media/zyshen/work/MOSCA/MOSCA-1.2.2/cddid_all.tbl' [Fri Jan 29 17:28:34 2021] Error in rule recognizer: jobid: 0 output: output/Annotation/Sample/reCOGnizer_results.xlsx

RuleException: CalledProcessError in line 175 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 175, in rule_recognizer File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message

szypanther commented 3 years ago

Hi, Another problem now after i download the hmm_PGAP.tsv and cddid_all.tbl. wget https://ftp.ncbi.nlm.nih.gov/hmm/3.0/hmm_PGAP.tsv

BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 09:33:34: Organizing annotation results [1/8] Handling CDD identifications Traceback (most recent call last): File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 462, in main() File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 394, in main report = parse_blast('{}/{}_aligned.blast'.format(args.output, db)) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 198, in parse_blast blast = pd.read_csv(file, sep='\t', header=None) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 452, in _read parser = TextFileReader(fp_or_buf, kwds) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 946, in init self._make_engine(self.engine) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 1178, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 2008, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file [Fri Jan 29 17:33:35 2021] Error in rule recognizer: jobid: 0 output: output/Annotation/Sample/reCOGnizer_results.xlsx

RuleException: CalledProcessError in line 175 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 175, in rule_recognizer File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message zhiyong

iquasere commented 3 years ago

hi iquasere, conda install -c bioconda recognizer it works now after this command. thanks zhiyong

Yes, this is the right command ^^

As for obtaining the databases with reCOGnizer, it automatically downloads all of them - MOSCA was designed to obtain everything by itself, what the tools don't obtain automatically, MOSCA will get them. Can you put here the entire output of this command?

recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces
iquasere commented 3 years ago

On another note, if the problem is with reCOGnizer, I am also its developer, so solutions should come in fast ^^ Although it seems MOSCA is already giving you more problems than putting the commands by hand xD

iquasere commented 3 years ago

When MOSCA's workflow finishes, you can get the remaining results with KEGGCharter by installing it with conda install -c conda-forge -c bioconda keggcharter=0.1.3 and running

kegg_charter.py -f output/MOSCA_Entry_Report.xlsx -gcol [comma-separated list of MG columns] -tcol [comma-separated list of MT columns] -keggc "Cross-reference (KEGG)" -o output/KEGGCharter_results -tc "Taxonomic lineage (GENUS)"
szypanther commented 3 years ago

recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0.aux not found! Some part of CDD was not valid! Generating databases for [13] threads. makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7 INPUT ERROR: Input file contains no smp filnames makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10 makeprofiledb -in /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_12.pn -title /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_12 -out /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_1312 INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames INPUT ERROR: Input file contains no smp filnames .... sed -i -e 's/ //g' output/Annotation/Sample/fgs.faa 2021-01-29 14:27:24: Running annotation with RPS-BLAST and CDD database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/cd_13_12 -out output/Annotation/Sample/CDD_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:24: Running annotation with RPS-BLAST and Pfam database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/pfam_13_12 -out output/Annotation/Sample/Pfam_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:24: Running annotation with RPS-BLAST and NCBIfam database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/NF_13_12 -out output/Annotation/Sample/NCBIfam_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:24: Running annotation with RPS-BLAST and Protein_Clusters database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/PRK_13_12 -out output/Annotation/Sample/Protein_Clusters_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:24: Running annotation with RPS-BLAST and Smart database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/smart_13_12 -out output/Annotation/Sample/Smart_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:25: Running annotation with RPS-BLAST and TIGRFAM database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/TIGR_13_12 -out output/Annotation/Sample/TIGRFAM_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:25: Running annotation with RPS-BLAST and COG database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/COG_13_12 -out output/Annotation/Sample/COG_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:25: Running annotation with RPS-BLAST and KOG database as reference. rpsblast -query output/Annotation/Sample/fgs.faa -db /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_0 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_1 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_2 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_3 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_4 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_5 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_6 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_7 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_8 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_9 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_10 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_11 /media/zyshen/work/MOSCA/MOSCA-1.2.2/KOG_13_12 -out output/Annotation/Sample/KOG_aligned.blast -outfmt 6 -num_threads 13 -max_target_seqs 1 BLAST engine error: Cannot retrieve path to RPS database 2021-01-29 14:27:27: Organizing annotation results [1/8] Handling CDD identifications Traceback (most recent call last): File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 462, in main() File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 394, in main report = parse_blast('{}/{}_aligned.blast'.format(args.output, db)) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/bin/recognizer.py", line 198, in parse_blast blast = pd.read_csv(file, sep='\t', header=None) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 452, in _read parser = TextFileReader(fp_or_buf, kwds) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 946, in init self._make_engine(self.engine) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 1178, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/parsers.py", line 2008, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file (mosca-1.2.2) [22:27 zyshen@gpuserver MOSCA-1.2.2] >

iquasere commented 3 years ago

Sorry, please run

recognizer.py -f output/Annotation/Sample/fgs.faa -t 13 -o output/Annotation/Sample -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 --remove-spaces --download-resources

and then re-run MOSCA's command.

Notice the --download-resources at the end. reCOGnizer got an update, it now sets as default to not download CDD and all other resources. I'll fix this by including that --download-resources parameter on reCOGnizer's command. This will only be required once, so next times you run MOSCA or reCOGnizer, this option won't be needed, and it will do that step just fine.

szypanther commented 3 years ago

Thanks iquasere, It seems work now and will let you know when done :)

iquasere commented 3 years ago

Sorry for closing without notice, but the original issue presented here has now been handled in version 1.3. If further problems arise, please don't hesitate in opening a new issue.