iquasere / MOSCA

Meta-Omics Software for Community Analysis
GNU General Public License v3.0
35 stars 4 forks source link

New problem encounter #12

Closed szypanther closed 3 years ago

szypanther commented 3 years ago

Hi iquasere, Sorry to trouble you again, Here I still encounter another new problem after running almost 2 days later. .... 5500000 alignment record pairs processed. 5600000 alignment record pairs processed. 5700000 alignment record pairs processed. 5800000 alignment record pairs processed. 5834173 alignment pairs processed. [Tue Feb 2 15:08:24 2021] Finished job 9. 1 of 5 steps (20%) done

[Tue Feb 2 15:08:24 2021] rule join_information: input: output/Annotation/uniprotinfo.tsv, output/Annotation/Sample/aligned.blast, output/Annotation/Sample/reCOGnizer_results.xlsx, output/Metatranscriptomics/mtname.readcounts, output/Annotation/mgname.readcounts output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv jobid: 5 threads: 12

Job counts: count jobs 1 join_information 1 python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py -e output/experiments.tsv -t 12 -o output -if tsv -nm TMM 2021-02-02 07:08:28: Joining data for sample: Sample Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py", line 147, in Joiner().run() File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py", line 58, in run sheet_names = pd.ExcelFile(recognizer_filename).sheet_names File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 867, in init self._reader = self._enginesengine File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_xlrd.py", line 22, in init super().init(filepath_or_buffer) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 353, in init self.book = self.load_workbook(filepath_or_buffer) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_xlrd.py", line 37, in load_workbook return open_workbook(filepath_or_buffer) File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/init.py", line 170, in open_workbook raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported') xlrd.biffh.XLRDError: Excel xlsx file; not supported [Tue Feb 2 15:08:28 2021] Error in rule join_information: jobid: 0 output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv

RuleException: CalledProcessError in line 232 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py -e output/experiments.tsv -t 12 -o output -if tsv -nm TMM' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 232, in rule_join_information File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-02T143849.506822.snakemake.log

you can see the size of mtname_bowtie2_report.txt is zero. -rw-rw-r-- 1 zyshen zyshen 630 2月 2 08:27 mtname.log -rw-rw-r-- 1 zyshen zyshen 13M 2月 2 08:36 mtname.readcounts -rw-rw-r-- 1 zyshen zyshen 4.5G 2月 2 08:27 mtname.sam -rw-rw-r-- 1 zyshen zyshen 0 2月 2 08:26 mtname_bowtie2_report.txt

any suggestion? Thanks!

szypanther commented 3 years ago

I resolve it now. It was xlrd version problem!

(mosca-1.2.2) (15:08 zyshen@gpuserver MOSCA-1.2.2) > pip uninstall xlrd Uninstalling xlrd-2.0.1: /media/zyshen/miniconda3/envs/mosca-1.2.2/bin/pycache/runxlrd.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/bin/runxlrd.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/INSTALLER /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/LICENSE /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/METADATA /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/RECORD /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/WHEEL /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/top_level.txt /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/init.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/init.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/biffh.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/book.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/compdoc.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/formatting.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/formula.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/info.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/sheet.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/timemachine.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/xldate.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/biffh.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/book.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/compdoc.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/formatting.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/formula.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/info.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/sheet.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/timemachine.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/xldate.py Proceed (y/n)? y Successfully uninstalled xlrd-2.0.1 You are using pip version 9.0.1, however version 21.0 is available. You should consider upgrading via the 'pip install --upgrade pip' command. (mosca-1.2.2) (15:31 zyshen@gpuserver MOSCA-1.2.2) > pip install xlrd==1.2.0 Collecting xlrd==1.2.0 Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB) 100% |████████████████████████████████| 112kB 521kB/s Installing collected packages: xlrd Successfully installed xlrd-1.2.0 You are using pip version 9.0.1, however version 21.0 is available. You should consider upgrading via the 'pip install --upgrade pip' command.

iquasere commented 3 years ago

Sorry, just checking now, and you are already losing a lot of time debugging MOSCA! You are right, I have 1.2.0 in my system, and will update the meta.yaml for next versions of MOSCA to use that version of xlrd.

No idea how this did not happen on my system when testing though...

szypanther commented 3 years ago

HI iquasere, I can't finish this step. Rscript /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/de_analysis.R --readcounts output/Metatranscriptomics/expression_matrix.tsv --conditions Mt --output output/Metatranscriptomics

..... The following objects are masked from ‘package:Biobase’:

anyMissing, rowMedians

:‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

aperm, apply, rowsum

[1] "Readcounts: output/Metatranscriptomics/expression_matrix.tsv" [1] "Conditions: Mt" [1] "Method: differential" [1] "Output: output/Metatranscriptomics" Error in DESeqDataSet(se, design = design, ignoreRank) : design has a single variable, with all samples having the same value. use instead a design of '~ 1'. estimateSizeFactors, rlog and the VST can then be used Calls: DESeqDataSetFromMatrix -> DESeqDataSet : Warning messages: 1: In DESeqDataSet(se, design = design, ignoreRank) : all genes have equal values for all samples. will not be able to perform differential analysis 2: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors stop

iquasere commented 3 years ago

I see you specified only one condition: Mt. This is set on the experiments file, and will be fixed by putting there at least two different conditions!

szypanther commented 3 years ago

Thanks iquasere, you reply so quickly! Ok I will add more MT data in the test. It maybe need more time to run it. I will let you know the result when done. It's not easy to run it success in one time. Of course, It's valuble to do it. I'm pretty sure this will save me a lot of time after run the test success in my project. Thanks :)

iquasere commented 3 years ago

Yes, after this first bumpy ride it should be successful every time. Some problems were on my end, and have already been fixed. Let's hope they were the last. On another note, MOSCA requires more than one MT condition, but because of your test run I see that won't always be the case - sometimes the user might only want to know what pathways are being expressed, and the relative proportion between expressions of different genes on the same sample. I will change MOSCA so, in the future, it will bypass the differential expression analysis step, to follow directly to reporting and KEGGCharter. For now, however, that is only possible by either adding another MT dataset, or by editing the Snakefile. If you do want this, I can offer a cut away version of MOSCA that bypasses this step

szypanther commented 3 years ago

Thanks iquasere, Yes, It will be better if you can offer that special version of MOSCA. Really thanks for your help and good work! It's in time that our lab will have a huge MT and MG data coming, your good work can help me to avoid spending more time on the pipeline builiding! Thanks again!

best, zhiyong

szypanther commented 3 years ago

HI iquasere, I found that this step of upimapi.py some time can't finish and to generate the uniprotinfo.tsv file success. I run it manual two times and then can generate the file success. However, it still failure again after I rerun the whole thing after adding more MT data in the test.

(mosca-1.2.2) (13:28 zyshen@gpuserver MOSCA-1.2.2) > python workflow/mosca.py -c config.json Building DAG of jobs... Using shell: /bin/bash Provided cores: 96 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 differential_expression 1 join_information 1 report 1 upimapi 5

[Wed Feb 3 13:28:29 2021] rule upimapi: input: output/Annotation/Sample/aligned.blast output: output/Annotation/uniprotinfo.tsv jobid: 6

Job counts: count jobs 1 upimapi 1 upimapi.py -i output/Annotation/Sample/aligned.blast -o output/Annotation/uniprotinfo --blast --full-id output/Annotation/uniprotinfo.tsv not found or empty. Will perform mapping for all IDs. IDs present in uniprotinfo file: 0 IDs missing: 181068 Information already gathered for 0 ids. Still missing for 181068. Retrieving UniProt information from 181068 IDs. Mapping failed at some point! | Could not map additional IDs for this mapping. There were probably some outdated IDs. For more questions, please contact through https://github.com/iquasere/UPIMAPI/issues Maximum iterations were made. Results related to 181068 IDs were not obtained. IDs with missing information are available at output/Annotation/ids_unmapped.txt and information obtained is available at output/Annotation/uniprotinfo.tsv echo 'done' > output/Annotation/{sample}.txt

zhiyong

szypanther commented 3 years ago

This step usually spend very long time and it's so easy to break. Should we need to download it for each time?

(mosca-1.2.2) (16:26 zyshen@gpuserver MOSCA-1.2.2) > upimapi.py -i output/Annotation/Sample/aligned.blast -o output/Annotation/uniprotinfo --blast --full-id output/Annotation/uniprotinfo.tsv not found or empty. Will perform mapping for all IDs. IDs present in uniprotinfo file: 0 IDs missing: 181068 Information already gathered for 0 ids. Still missing for 181068. Retrieving UniProt information from 181068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 4000 ids. Still missing for 179068. Retrieving UniProt information from 179068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 16000 ids. Still missing for 173068. Retrieving UniProt information from 173068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 32000 ids. Still missing for 165068. Retrieving UniProt information from 165068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 42000 ids. Still missing for 160068. Retrieving UniProt information from 160068 IDs. 19% |#######################################

iquasere commented 3 years ago

I faced this problem when applying the "no assembly" workflow on MOSCA. Problem is how I set the number of tries in UPIMAPI, it tries a number of times during the entire workflow to get information, and if it fails that number, it will finish. However, it makes much more sense to have such a limit of tries on the individual interval of IDs it tries to obtain, not on the entire workflow. I will launch a new version of UPIMAPI today with that updated. Testing it with 22.000 IDs presented no problem, but 1.000.000 IDs begin to show the weakness of that approach

iquasere commented 3 years ago

Next version of UPIMAPI will be available soon through Bioconda. About that step taking too long and having to run it every time: the time it takes is dependent on UniProt's servers. If having to submit it manually like you had to because of UPIMAPI weak implementation, it is a lot of unnecessary work. In this next version UPIMAPI is more robust, and will try, for every interval, 3 times before giving up. Also, even with that hiccup, it will continue to map for the remaining IDs now.

But the time it takes will still be there, because it has to access through the web. I tried working on a local version, but never managed to do it - this information is stored in an xml that a few colleagues of mine were working on, but none of us managed to organize that.

On another note, in MOSCA this was the main driver to use snakemake. With snakemake, I can allocate one thread to run these requests, while the computationally intensive tasks run simultaneously. So you don't notice the UPIMAPI step, because it will likely always run at the same time as the functional annotation of reCOGnizer and the alignments of the quantification steps.

iquasere commented 3 years ago

Should we need to download it for each time?

This is a good question. Maybe should have an option on MOSCA to set the file to where IDs should be downloaded? That way, it would go to the same file, and UPIMAPI already checks what IDs are already present on the output.

At this point, MOSCA saves all info to output/Annotation/uniprotinfo.tsv, so for each job it will not repeat IDs. But between jobs it might

szypanther commented 3 years ago

Dear I quasere, Thank you for your elaborations that giving me so many details. It seems better after i updated the UNIMAPI version. After that, I encount another problem. Paste as follows:

[1] "Readcounts: output/Metatranscriptomics/expression_matrix.tsv" [1] "Conditions: Mt3,Mt2" [1] "Method: differential" [1] "Output: output/Metatranscriptomics" Warning message: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors estimating size factors estimating dispersions Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.22.

Calls: DESeq ... estimateDispersions -> .local -> checkForExperimentalReplicates Stop excute [Thu Feb 4 14:44:56 2021] Error in rule differential_expression: jobid: 0 output: output/Metatranscriptomics/gene_expression.jpeg, output/Metatranscriptomics/sample_distances.jpeg, output/Metatranscriptomics/condition_treated_results.csv

RuleException: CalledProcessError in line 246 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/../../../bin/Rscript /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/de_analysis.R --readcounts output/Metatranscriptomics/expression_matrix.tsv --conditions Mt3,Mt2 --output output/Metatranscriptomics' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 246, in rule_differential_expression File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-04T144448.543478.snakemake.log

The content of experiments.tsv file as follows cat output/experiments.tsv Files Sample Data type Condition Name /media/zyshen/MOSCA/20201023_L_QMK/mg_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mg_R2.fastq Sample dna MG mgname /media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name /media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name

best, zhiyong

szypanther commented 3 years ago

For the new version of MOSCA (v1.3.1), my testing also encounter the following error:

usage: binning.py [-h] -c CONTIGS [-t THREADS] [-o OUTPUT] [-mset {40,107}] [-s SAMPLE] [-r READS] binning.py: error: argument -mset/--markerset: invalid choice: '30' (choose from '40', '107') [Thu Feb 4 10:37:48 2021] Error in rule binning: jobid: 0 output: output/Binning/Sample/checkm.tsv

RuleException: CalledProcessError in line 162 of /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/binning.py -c output/Assembly/Sample/contigs.fasta -t 14 -o output/Binning/Sample -r output/Preprocess/Sample_forward.fastq,output/Preprocess/Sample_reverse.fastq -mset 30' returned non-zero exit status 2. File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile", line 162, in rule_binning File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message output/Annotation/uniprotinfo.tsv not found or empty. Will perform mapping for all IDs. IDs present in uniprotinfo file: 0 IDs missing: 174308 Information already gathered for 0 ids. Still missing for 174308. Retrieving UniProt information from 174308 IDs. 100% |#####################################################################################################################################################################################################| Failed to retrieve information for some IDs. Retrying request. Information already gathered for 344530 ids. Still missing for 2043. Retrieving UniProt information from 2043 IDs. 100% |#####################################################################################################################################################################################################| Results for all IDs are available at output/Annotation/uniprotinfo.tsv [Thu Feb 4 14:23:56 2021] Finished job 2. 1 of 6 steps (17%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.3.1/.snakemake/log/2021-02-04T103747.776804.snakemake.log

zhiyong

iquasere commented 3 years ago

For the binning error, seems like you set 30 for the markerset option. This can only be set to either 40 or 107 - MOSGUITO only allows setting those two values, but after having the config file that limitation is not clear. 40 markerset contains marker genes that are common to both Archaea and Bacteria, while 107 marker genes are specific for Bacteria. Therefore, if you are only interested in Bacteria, 107 will be better than 40, otherwise use 40.

For the DESeq2 analysis, turns out it needs replicates for the statistical analysis. One workaround for this is to specify two times the lines you have on your experiments.tsv. Likely, you won't need to repeat any preprocessing if you change it to this:

Files Sample Data type Condition Name
/media/zyshen/MOSCA/20201023_L_QMK/mg_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mg_R2.fastq Sample dna MG mgname
/media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name
/media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name
/media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name
/media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name

I am going to test this locally with my datasets, as I don't know if the absence of variance between the datasets will inviabilize some of the statistics. You can try on your end as well. One solution for this problem in the future is to generate myself a heatmap in R, comparing just the log transformed values of the two samples.

szypanther commented 3 years ago

Hi iquasere, It running now for the MOSCA-1.3.1 version test after change the value to 107. For the former MOSCA-1.2.2 version test. I already change the experiments.tsv as you shown above.

The following objects are masked from ‘package:Biobase’:

anyMissing, rowMedians

loading:‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

aperm, apply, rowsum

[1] "Readcounts: output/Metatranscriptomics/expression_matrix.tsv" [1] "Conditions: Mt3,Mt3,Mt2,Mt2" [1] "Method: differential" [1] "Output: output/Metatranscriptomics" Error in .rowNamesDF<-(x, value = value) : 'row.names' length does not match! Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<- Stop execute [Thu Feb 4 19:51:25 2021] Error in rule differential_expression: jobid: 0 output: output/Metatranscriptomics/gene_expression.jpeg, output/Metatranscriptomics/sample_distances.jpeg, output/Metatranscriptomics/condition_treated_results.csv

RuleException: CalledProcessError in line 246 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/../../../bin/Rscript /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/de_analysis.R --readcounts output/Metatranscriptomics/expression_matrix.tsv --conditions Mt3,Mt3,Mt2,Mt2 --output output/Metatranscriptomics' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 246, in rule_differential_expression File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-04T195118.815607.snakemake.log (mosca-1.2.2) (19:51 zyshen@gpuserver MOSCA-1.2.2) > cat output/experiments.tsv Files Sample Data type Condition Name /media/zyshen/MOSCA/20201023_L_QMK/mg_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mg_R2.fastq Sample dna MG mgname /media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name /media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name /media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name /media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name

any suggestion? thanks

zhiyong

iquasere commented 3 years ago

I tested with my datasets, replicating the columns, and got

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
  all gene-wise dispersion estimates are within 2 orders of magnitude
  from the minimum value, and so the standard curve fitting techniques will not work.
  One can instead use the gene-wise estimates as final estimates:
  dds <- estimateDispersionsGeneEst(dds)
  dispersions(dds) <- mcols(dds)$dispGeneEst
  ...then continue with testing using nbinomWaldTest or nbinomLRT
Calls: DESeq ... estimateDispersions -> .local -> estimateDispersionsFit
Execution halted

so likely DESeq2 will always require true replicates to run the differential expression (which makes sense). So you can end your analysis here, I can only give instructions on how to hack MOSCA in order to finish it without differential expression.

At /media/zyshen/miniconda3/envs/mosca-1.2.2/share/MOSCA/scripts/Snakefile, comment line 345, so it becomes

        #expand("{output}/Metatranscriptomics/condition_treated_results.csv", output = config["output"])

At /media/zyshen/miniconda3/envs/mosca-1.2.2/share/MOSCA/scripts/report.py, comment line 238, so it becomes

            #self.info_from_differential_expression(args.output, sample)

Of course this isn't usual, and likely MOSCA will have to be adapted to accept requests without replicates. But for now, it's the quickest way you can get your work going again.

szypanther commented 3 years ago

Thanks iquasere, It finally done success after comment the specify line!

(mosca-1.2.2) (12:28 zyshen@gpuserver MOSCA-1.2.2) > python workflow/mosca.py -c config.json Building DAG of jobs... Using shell: /bin/bash Provided cores: 96 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 report 2

[Fri Feb 5 12:28:42 2021] rule report: input: output/MOSCA_Protein_Report.xlsx output: output/technical_report.tsv, output/MOSCA_General_Report.xlsx, output/MOSCA_results.zip jobid: 12

Job counts: count jobs 1 report 1 python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/report.py -e output/experiments.tsv -o output -ldir /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/../resources -if tsv conda list Initializing Report Retrieving preprocessing information for dataset: mgname Retrieving preprocessing information for dataset: mt3name Retrieving preprocessing information for dataset: mt3name Retrieving preprocessing information for dataset: mt2name Retrieving preprocessing information for dataset: mt2name

Retrieving assembly information for sample Sample Retrieving annotation information for sample Sample cat output/Metatranscriptomics/mt3name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' cat output/Metatranscriptomics/mt3name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' cat output/Metatranscriptomics/mt2name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' cat output/Metatranscriptomics/mt2name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' [Fri Feb 5 13:17:23 2021] Finished job 12. 1 of 2 steps (50%) done

[Fri Feb 5 13:17:23 2021] localrule all: input: output/Binning/Sample/checkm.tsv, output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/technical_report.tsv, output/MOSCA_General_Report.xlsx, output/MOSCA_results.zip jobid: 0

[Fri Feb 5 13:17:23 2021] Finished job 0. 2 of 2 steps (100%) done Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-05T122842.016404.snakemake.log (mosca-1.2.2) (13:17 zyshen@gpuserver MOSCA-1.2.2) >

iquasere commented 3 years ago

Oh man, happy to hear that ahah. This means MOSCA is fully functional at this point, however it must take as input at least two different conditions with at least duplicates. I tested it with two real conditions of duplicates and three simulated conditions of triplicates. There are these little nuances that I must work upon. As the tool grows so will the different needs, and if the users are as helpful as you were, MOSCA will grow with them!

At this point, only KEGGCharter is missing. If you still want to run it, this command installs the correct version conda install -c conda-forge -c bioconda keggcharter=0.1.3 and this runs it for your datasets

kegg_charter.py -f output/MOSCA_Entry_Report.xlsx -gcol MG -tcol Mt3,Mt2 -keggc "Cross-reference (KEGG)" -o output/KEGGCharter_results -tc "Taxonomic lineage (GENUS)"

Thank you very much for your patience. Hope it was worth it ^^

szypanther commented 3 years ago

Thanks iquasere, I will use the true dulicates data to run the whole thing and hope we don't need to comment any line and obtain the differential expression results. I wondering how DESeq2 can recognize my data is not a true replicates if the different condition still can obtain the same gene expression value :) I check my test data, it really has different value in some rows and can't understand why the DESeq2 didn't work. :)

iquasere commented 3 years ago

If we just copy the columns, we will get

Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
  all gene-wise dispersion estimates are within 2 orders of magnitude
  from the minimum value, and so the standard curve fitting techniques will not work.
  One can instead use the gene-wise estimates as final estimates:
  dds <- estimateDispersionsGeneEst(dds)
  dispersions(dds) <- mcols(dds)$dispGeneEst
  ...then continue with testing using nbinomWaldTest or nbinomLRT
Calls: DESeq ... estimateDispersions -> .local -> estimateDispersionsFit
Execution halted

so some problems may happen in the future if some datasets have no significant differential expression. In that case, the DE package of MOSCA may require a variance check, which if lower than this threshold, might have to undergo a different statistical analysis...

szypanther commented 3 years ago

Hi iquasere,
Sorry for trouble you again, This time I try to test my own true MT and MG data together. and encounter such error as follows, Is this mean my fastq data has some problem? I check that none of the reads is shorter than 18 nucleotides in my raw data. In the version of MOSCA1.2.2 it still keep running but MOSCA1.3.1 will exit. ............ WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched

WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched

WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched bash /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/unmerge-paired-reads.sh output/Preprocess/SortMeRNA/mt1C_interleaved.fastq output/Preprocess/SortMeRNA/mt1C_forward.fastq output/Preprocess/SortMeRNA/mt1C_reverse.fastq Processing output/Preprocess/SortMeRNA/mt1C_forward.fastq .. Processing output/Preprocess/SortMeRNA/mt1C_reverse.fastq .. Done. Removed: output/Preprocess/SortMeRNA/mt1C_interleaved.fastq fastqc --outdir output/Preprocess/FastQC --threads 14 --extract output/Preprocess/SortMeRNA/mt1C_forward.fastq output/Preprocess/SortMeRNA/mt1C_reverse.fastq Started analysis of mt1C_forward.fastq Started analysis of mt1C_reverse.fastq Approx 5% complete for mt1C_forward.fastq Approx 5% complete for mt1C_reverse.fastq Approx 10% complete for mt1C_forward.fastq Approx 10% complete for mt1C_reverse.fastq Approx 15% complete for mt1C_forward.fastq Approx 15% complete for mt1C_reverse.fastq Approx 20% complete for mt1C_forward.fastq Approx 20% complete for mt1C_reverse.fastq Approx 25% complete for mt1C_forward.fastq Approx 25% complete for mt1C_reverse.fastq Approx 30% complete for mt1C_forward.fastq Approx 30% complete for mt1C_reverse.fastq Approx 35% complete for mt1C_forward.fastq Approx 35% complete for mt1C_reverse.fastq Approx 40% complete for mt1C_forward.fastq Approx 40% complete for mt1C_reverse.fastq Approx 45% complete for mt1C_forward.fastq Approx 45% complete for mt1C_reverse.fastq Approx 50% complete for mt1C_forward.fastq Approx 50% complete for mt1C_reverse.fastq Approx 55% complete for mt1C_forward.fastq Approx 55% complete for mt1C_reverse.fastq Approx 60% complete for mt1C_forward.fastq Approx 60% complete for mt1C_reverse.fastq Approx 65% complete for mt1C_forward.fastq Approx 65% complete for mt1C_reverse.fastq Approx 70% complete for mt1C_forward.fastq Approx 70% complete for mt1C_reverse.fastq Approx 75% complete for mt1C_forward.fastq Approx 75% complete for mt1C_reverse.fastq Approx 80% complete for mt1C_forward.fastq Approx 80% complete for mt1C_reverse.fastq Approx 85% complete for mt1C_forward.fastq Approx 85% complete for mt1C_reverse.fastq Approx 90% complete for mt1C_forward.fastq Approx 90% complete for mt1C_reverse.fastq Approx 95% complete for mt1C_forward.fastq Approx 95% complete for mt1C_reverse.fastq Failed to process file mt1C_forward.fastq uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77) at java.base/java.lang.Thread.run(Thread.java:834) Failed to process file mt1C_reverse.fastq uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77) at java.base/java.lang.Thread.run(Thread.java:834) Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 376, in Preprocesser().run() File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 364, in run minlen=args.minlen, original_files=True if args.input == original_input else False) File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 269, in quality_trimming data = parse_fastqc(report) File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/mosca_tools.py", line 53, in parse_fastqc file = open(filename).read().split('\n') FileNotFoundError: [Errno 2] No such file or directory: 'output/Preprocess/FastQC/mt1C_forward_fastqc/fastqc_data.txt' [Mon Feb 8 12:10:41 2021] Error in rule preprocess: jobid: 0 output: output/Preprocess/Trimmomatic/quality_trimmed_mt1C_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mt1C_reverse_paired.fq

RuleException: CalledProcessError in line 79 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py -i /media/zyshen/MOSCA/20201023_L_QMK/mt1C_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt1C_R2.fastq -t 14 -o output/Preprocess -adaptdir /media/zyshen/work/MOSCA/MOSCA-1.2.2/adapters -rrnadbs /media/zyshen/work/MOSCA/MOSCA-1.2.2/rRNA_databases -d mrna -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 -n mt1C --minlen 100 --avgqual 20' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 79, in rule_preprocess File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message 0:13:19.610 4G / 4G INFO K-mer Counting (kmer_data.cpp : 321) Processed 21034020 reads 0:13:19.616 4G / 4G INFO K-mer Counting (kmer_data.cpp : 326) Total 21034020 reads processed 0:13:19.617 4G / 4G INFO K-mer Index Building (kmer_index_builder.hpp : 301) Building kmer index 0:13:19.617 4G / 4G INFO General (kmer_index_builder.hpp : 117) Splitting kmer instances into 22

iquasere commented 3 years ago

I check that none of the reads is shorter than 18 nucleotides in my raw data.

This WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched comes from SortMeRNA, and is normal, happens to me everytime I have adapters. In the future I will likely remove those reads with the MINLEN tool of Trimmomatic.

ID line didn't start with '@'

Now this is very weird. Can you please tell me what are the outputs of these commands?

wc -l output/Preprocess/SortMeRNA/mt1C_forward.fastq
grep '@' -c output/Preprocess/SortMeRNA/mt1C_forward.fastq
wc -l output/Preprocess/SortMeRNA/mt1C_reverse.fastq
grep '@' -c  output/Preprocess/SortMeRNA/mt1C_reverse.fastq
szypanther commented 3 years ago

217089 queries aligned. The host system is detected to have 1081 GB of RAM. It is recommended to increase the block size for better performance using these parameters : -b12 -c1 [Mon Feb 8 17:24:12 2021] Finished job 7. 2 of 10 steps (20%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-08T160825.230213.snakemake.log (mosca-1.2.2) (17:24 zyshen@gpuserver MOSCA-1.2.2) > wc -l output/Preprocess/SortMeRNA/mt1C_forward.fastq 12168464 output/Preprocess/SortMeRNA/mt1C_forward.fastq (mosca-1.2.2) (11:20 zyshen@gpuserver MOSCA-1.2.2) > grep '@' -c output/Preprocess/SortMeRNA/mt1C_forward.fastq 3038087 (mosca-1.2.2) (11:20 zyshen@gpuserver MOSCA-1.2.2) > wc -l output/Preprocess/SortMeRNA/mt1C_reverse.fastq 12168460 output/Preprocess/SortMeRNA/mt1C_reverse.fastq (mosca-1.2.2) (11:21 zyshen@gpuserver MOSCA-1.2.2) > grep '@' -c output/Preprocess/SortMeRNA/mt1C_reverse.fastq 3038085

iquasere commented 3 years ago

Ok, you are using the phred score that employs @. Instead, please run

grep '^@' -c output/Preprocess/SortMeRNA/mt1C_forward.fastq
grep '^@' -c output/Preprocess/SortMeRNA/mt1C_reverse.fastq

so it only counts at beggining of line. But I think I know the problem here, you have an orphan read. MOSCA handled this in the past, but I though SortMeRNA had fixed it... what you need is to go edit /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py and in lines 227-233, change to

        for fr in ['forward', 'reverse']:
            self.remove_messed_reads('{}_{}.fastq'.format(basename, fr))

            self.remove_orphans(basename + '_forward.fastq', 
                               basename + '_reverse.fastq')

basicaly remove the plicas and fix indentation, and next version I am going to reimplement this in MOSCA. Such a shame, this step takes a long time to just remove a single read, but it fixes it...

szypanther commented 3 years ago

hi iquasere, Thanks, I edit the script as you suggested and encounter another new error.

.......... WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched

WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched

WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched bash /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/unmerge-paired-reads.sh output/Preprocess/SortMeRNA/mt1C_interleaved.fastq output/Preprocess/SortMeRNA/mt1C_forward.fastq output/Preprocess/SortMeRNA/mt1C_reverse.fastq Processing output/Preprocess/SortMeRNA/mt1C_forward.fastq .. Processing output/Preprocess/SortMeRNA/mt1C_reverse.fastq .. Done. Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 376, in Preprocesser().run() File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 352, in run original_files=True if args.input == original_input else False) File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 227, in rrna_removal self.remove_messedreads('{}{}.fastq'.format(basename, fr)) NameError: name 'basename' is not defined [Wed Feb 10 22:10:18 2021] Error in rule preprocess: jobid: 0 output: output/Preprocess/Trimmomatic/quality_trimmed_mt1C_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mt1C_reverse_paired.fq

RuleException: CalledProcessError in line 79 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py -i /media/zyshen/MOSCA/20201023_L_QMK/mt1C_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt1C_R2.fastq -t 14 -o output/Preprocess -adaptdir /media/zyshen/work/MOSCA/MOSCA-1.2.2/adapters -rrnadbs /media/zyshen/work/MOSCA/MOSCA-1.2.2/rRNA_databases -d mrna -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 -n mt1C --minlen 100 --avgqual 20' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 79, in rule_preprocess File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message

best regards, zhiyong

iquasere commented 3 years ago

Reimplemented this properly in 1.3.2. Tested and functional. You'll have to wait a little bit before running conda install -c conda-forge -c bioconda -c anaconda -y mosca=1.3.2, as it takes some time for it to be available through Bioconda

szypanther commented 3 years ago

Thanks iquasere, I download your new verison of MOSCA and run my data again. and paster the new error report as follows.

you also can download my data from the following address to help me run your pipeline again. http://143.89.25.148/labcloud/ account: guest passwd: guest I put all the six gz files on the desktop.

[Fri Feb 12 18:35:20 2021] rule join_information: input: output/Annotation/uniprotinfo.tsv, output/Annotation/Sample/aligned.blast, output/Annotation/Sample/reCOGnizer_results.xlsx, output/Metatranscriptomics/mt3A.readcounts, output/Metatranscriptomics/mt1C.readcounts, output/Annotation/mgname.readcounts output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv jobid: 1 threads: 22

Job counts: count jobs 1 join_information 1 python /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/join_information.py -e output/experiments.tsv -t 22 -o output -if tsv -nm TMM sys:1: DtypeWarning: Columns (19) have mixed types.Specify dtype option on import or set low_memory=False. 2021-02-12 10:35:22: Joining data for sample: Sample head -n -5 output/Annotation/mgname.readcounts seqkit fx2tab output/Assembly/Sample/contigs.fasta | sort | awk '{print $1"\t"length($2)}' | join - output/Annotation/mgname_no_tail.readcounts | awk '{print $1"\t"$3/$2}' Finding consensus COG for each Entry of Sample: Sample 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 174314/174314 [02:45<00:00, 1055.10it/s] Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/join_information.py", line 147, in Joiner().run() File "/media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/join_information.py", line 108, in run data = data.groupby('Entry')[abundance_analysed + expression_analysed].sum().reset_index() File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 1610, in getitem return super().getitem(key) File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/pandas/core/base.py", line 218, in getitem raise KeyError(f"Columns not found: {str(bad_keys)[1:-1]}") KeyError: "Columns not found: 'mt1C', 'mt3A'" [Fri Feb 12 18:50:03 2021] Error in rule join_information: jobid: 0 output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv

RuleException: CalledProcessError in line 282 of /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/join_information.py -e output/experiments.tsv -t 22 -o output -if tsv -nm TMM' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile", line 282, in rule_join_information File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.3.1/.snakemake/log/2021-02-12T183520.570545.snakemake.log

Happy lunar new year ! best, zhiyong

iquasere commented 3 years ago

I'm going to tackle this now, maybe at the end of the day will have some news!

I am downloading your files now, will inform when it is over!

Hope you had a good holiday! And have a great lunar year! ^^

szypanther commented 3 years ago

Hi iquasere, Did you run my data success and what's the problem of my data? Why they will generate such error report when running the pipeline of MOSCA. BTW, i use 1.3.3 version now and still stop at the same place and same error (KeyError: "Columns not found: 'mt1C', 'mt3A'") Thanks!

Zhiyong

iquasere commented 3 years ago

Hi there! I had to postpone this problem, because am now in a tight deadline to deal with a paper related to other tools. I am aiming at finishing it this weekend, and this question will be the next thing to solve. So likely I will have news next week, but not before that.

I did download the datasets, so you can close the access to your server. Let me see if in the next days I can replicate this annoying bug...

iquasere commented 3 years ago

Can you please share your config and experiments file as well? You can share those directly here

iquasere commented 3 years ago

Ok, this problem is reproducible in my end. It seems the final reporter script still does not consider the names inputted. I'm gonna send a new release likely today.

About running MOSCA with no replicates, I think differential expression is a vital part of the workflow, and it would be more hassle than useful to shape it for every want. Therefore, I am working on a page in MOSCA's wiki to address some helpful runs of only parts of MOSCA, and this is going to be one of the alternative workflows.

szypanther commented 3 years ago

HI iquasere, Thanks for your help! I recently running more of my raw data, enclosed pls find my config and experiments files. .........................

Analysis complete for mt3A_reverse.fastq join output/Preprocess/SortMeRNA/read1.txt output/Preprocess/SortMeRNA/read2.txt | awk '{print $1" "$2"\n"$3"\n+\n"$4

"output/Preprocess/SortMeRNA/mt1A_forward.fastq";print $1" "$5"\n"$6"\n+\n"$7 > "output/Preprocess/SortMeRNA/mt1A_reverse.fastq"}' join: output/Preprocess/SortMeRNA/read1.txt: No such file or directory Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 379, in Preprocesser().run() File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 349, in run original_files=True if args.input == original_input else False) File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 226, in rrna_removal '{}/{}_reverse.fastq'.format(out_dir, name), out_dir) File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 191, in remove_orphans os.remove(file) FileNotFoundError: [Errno 2] No such file or directory: 'output/Preprocess/SortMeRNA/read1.txt' [Mon Mar 1 14:06:09 2021] Error in rule preprocess: jobid: 0 output: output/Preprocess/Trimmomatic/quality_trimmed_mt1A_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mt1A_reverse_paired.fq

RuleException: CalledProcessError in line 111 of /media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py -i /media/zyshen/MOSCA/20201023_L_QMK/mt1A_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt1A_R2.fastq -t 6 -o output/Preprocess -adaptdir /media/zyshen/work/MOSCA/MOSCA-1.3.4/adapters -rrnadbs /media/zyshen/work/MOSCA/MOSCA-1.3.4/rRNA_databases -d mrna -rd /media/zyshen/work/MOSCA/MOSCA-1.3.4 -n mt1A --minlen 100 --avgqual 20' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/Snakefile", line 111, in rule_preprocess File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message ........... Analysis complete for quality_trimmed_mt1C_forward_paired.fq Analysis complete for quality_trimmed_mt1C_reverse_paired.fq [Mon Mar 1 14:25:04 2021] Finished job 13. 6 of 17 steps (35%) done

The pipeline can't go to the next step and still running the following task for a very long time!

48713 zyshen 20 0 2092084 1.275g 4848 R 599.0 0.1 3975:35 /media/zyshen/miniconda3/envs/snakemake/bin/spades-hammer /media/zyshen/work/MOSCA/MOSCA-1.3.4/output/Assembly/Sample/corrected/confi+

any suggestion?

best wishes, zhiyong

On Mon, Mar 1, 2021 at 7:28 PM João Sequeira notifications@github.com wrote:

Ok, this problem is reproducible in my end. It seems the final reporter script still does not consider the names inputted. I'm gonna send a new release likely today.

About running MOSCA with no replicates, I think differential expression is a vital part of the workflow, and it would be more hassle than useful to shape it for every want. Therefore, I am working on a page https://github.com/iquasere/MOSCA/wiki/Partial-runs in MOSCA's wiki to address some helpful runs of only parts of MOSCA, and this is going to be one of the alternative workflows.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/iquasere/MOSCA/issues/12#issuecomment-787876477, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZMJF26I34HE425VXGRBJTTBN27HANCNFSM4W5YVKVA .

-- Sincerely yours, Zhiyong Shen Cell phone: +852-56242611 Email: szypanther@gmail.com