Closed szypanther closed 3 years ago
I resolve it now. It was xlrd version problem!
(mosca-1.2.2) (15:08 zyshen@gpuserver MOSCA-1.2.2) > pip uninstall xlrd Uninstalling xlrd-2.0.1: /media/zyshen/miniconda3/envs/mosca-1.2.2/bin/pycache/runxlrd.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/bin/runxlrd.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/INSTALLER /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/LICENSE /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/METADATA /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/RECORD /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/WHEEL /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd-2.0.1.dist-info/top_level.txt /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/init.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/init.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/biffh.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/book.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/compdoc.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/formatting.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/formula.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/info.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/sheet.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/timemachine.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/pycache/xldate.cpython-36.pyc /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/biffh.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/book.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/compdoc.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/formatting.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/formula.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/info.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/sheet.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/timemachine.py /media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/xldate.py Proceed (y/n)? y Successfully uninstalled xlrd-2.0.1 You are using pip version 9.0.1, however version 21.0 is available. You should consider upgrading via the 'pip install --upgrade pip' command. (mosca-1.2.2) (15:31 zyshen@gpuserver MOSCA-1.2.2) > pip install xlrd==1.2.0 Collecting xlrd==1.2.0 Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB) 100% |████████████████████████████████| 112kB 521kB/s Installing collected packages: xlrd Successfully installed xlrd-1.2.0 You are using pip version 9.0.1, however version 21.0 is available. You should consider upgrading via the 'pip install --upgrade pip' command.
Sorry, just checking now, and you are already losing a lot of time debugging MOSCA! You are right, I have 1.2.0
in my system, and will update the meta.yaml for next versions of MOSCA to use that version of xlrd.
No idea how this did not happen on my system when testing though...
HI iquasere, I can't finish this step. Rscript /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/de_analysis.R --readcounts output/Metatranscriptomics/expression_matrix.tsv --conditions Mt --output output/Metatranscriptomics
..... The following objects are masked from ‘package:Biobase’:
anyMissing, rowMedians
:‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following objects are masked from ‘package:base’:
aperm, apply, rowsum
[1] "Readcounts: output/Metatranscriptomics/expression_matrix.tsv" [1] "Conditions: Mt" [1] "Method: differential" [1] "Output: output/Metatranscriptomics" Error in DESeqDataSet(se, design = design, ignoreRank) : design has a single variable, with all samples having the same value. use instead a design of '~ 1'. estimateSizeFactors, rlog and the VST can then be used Calls: DESeqDataSetFromMatrix -> DESeqDataSet : Warning messages: 1: In DESeqDataSet(se, design = design, ignoreRank) : all genes have equal values for all samples. will not be able to perform differential analysis 2: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors stop
I see you specified only one condition: Mt
. This is set on the experiments file, and will be fixed by putting there at least two different conditions!
Thanks iquasere, you reply so quickly! Ok I will add more MT data in the test. It maybe need more time to run it. I will let you know the result when done. It's not easy to run it success in one time. Of course, It's valuble to do it. I'm pretty sure this will save me a lot of time after run the test success in my project. Thanks :)
Yes, after this first bumpy ride it should be successful every time. Some problems were on my end, and have already been fixed. Let's hope they were the last. On another note, MOSCA requires more than one MT condition, but because of your test run I see that won't always be the case - sometimes the user might only want to know what pathways are being expressed, and the relative proportion between expressions of different genes on the same sample. I will change MOSCA so, in the future, it will bypass the differential expression analysis step, to follow directly to reporting and KEGGCharter. For now, however, that is only possible by either adding another MT dataset, or by editing the Snakefile. If you do want this, I can offer a cut away version of MOSCA that bypasses this step
Thanks iquasere, Yes, It will be better if you can offer that special version of MOSCA. Really thanks for your help and good work! It's in time that our lab will have a huge MT and MG data coming, your good work can help me to avoid spending more time on the pipeline builiding! Thanks again!
best, zhiyong
HI iquasere, I found that this step of upimapi.py some time can't finish and to generate the uniprotinfo.tsv file success. I run it manual two times and then can generate the file success. However, it still failure again after I rerun the whole thing after adding more MT data in the test.
(mosca-1.2.2) (13:28 zyshen@gpuserver MOSCA-1.2.2) > python workflow/mosca.py -c config.json Building DAG of jobs... Using shell: /bin/bash Provided cores: 96 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 differential_expression 1 join_information 1 report 1 upimapi 5
[Wed Feb 3 13:28:29 2021] rule upimapi: input: output/Annotation/Sample/aligned.blast output: output/Annotation/uniprotinfo.tsv jobid: 6
Job counts: count jobs 1 upimapi 1 upimapi.py -i output/Annotation/Sample/aligned.blast -o output/Annotation/uniprotinfo --blast --full-id output/Annotation/uniprotinfo.tsv not found or empty. Will perform mapping for all IDs. IDs present in uniprotinfo file: 0 IDs missing: 181068 Information already gathered for 0 ids. Still missing for 181068. Retrieving UniProt information from 181068 IDs. Mapping failed at some point! | Could not map additional IDs for this mapping. There were probably some outdated IDs. For more questions, please contact through https://github.com/iquasere/UPIMAPI/issues Maximum iterations were made. Results related to 181068 IDs were not obtained. IDs with missing information are available at output/Annotation/ids_unmapped.txt and information obtained is available at output/Annotation/uniprotinfo.tsv echo 'done' > output/Annotation/{sample}.txt
zhiyong
This step usually spend very long time and it's so easy to break. Should we need to download it for each time?
(mosca-1.2.2) (16:26 zyshen@gpuserver MOSCA-1.2.2) > upimapi.py -i output/Annotation/Sample/aligned.blast -o output/Annotation/uniprotinfo --blast --full-id output/Annotation/uniprotinfo.tsv not found or empty. Will perform mapping for all IDs. IDs present in uniprotinfo file: 0 IDs missing: 181068 Information already gathered for 0 ids. Still missing for 181068. Retrieving UniProt information from 181068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 4000 ids. Still missing for 179068. Retrieving UniProt information from 179068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 16000 ids. Still missing for 173068. Retrieving UniProt information from 173068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 32000 ids. Still missing for 165068. Retrieving UniProt information from 165068 IDs. Mapping failed at some point! | Failed to retrieve information for some IDs. Retrying request. Information already gathered for 42000 ids. Still missing for 160068. Retrieving UniProt information from 160068 IDs. 19% |#######################################
I faced this problem when applying the "no assembly" workflow on MOSCA. Problem is how I set the number of tries in UPIMAPI, it tries a number of times during the entire workflow to get information, and if it fails that number, it will finish. However, it makes much more sense to have such a limit of tries on the individual interval of IDs it tries to obtain, not on the entire workflow. I will launch a new version of UPIMAPI today with that updated. Testing it with 22.000 IDs presented no problem, but 1.000.000 IDs begin to show the weakness of that approach
Next version of UPIMAPI will be available soon through Bioconda. About that step taking too long and having to run it every time: the time it takes is dependent on UniProt's servers. If having to submit it manually like you had to because of UPIMAPI weak implementation, it is a lot of unnecessary work. In this next version UPIMAPI is more robust, and will try, for every interval, 3 times before giving up. Also, even with that hiccup, it will continue to map for the remaining IDs now.
But the time it takes will still be there, because it has to access through the web. I tried working on a local version, but never managed to do it - this information is stored in an xml that a few colleagues of mine were working on, but none of us managed to organize that.
On another note, in MOSCA this was the main driver to use snakemake. With snakemake, I can allocate one thread to run these requests, while the computationally intensive tasks run simultaneously. So you don't notice the UPIMAPI step, because it will likely always run at the same time as the functional annotation of reCOGnizer and the alignments of the quantification steps.
Should we need to download it for each time?
This is a good question. Maybe should have an option on MOSCA to set the file to where IDs should be downloaded? That way, it would go to the same file, and UPIMAPI already checks what IDs are already present on the output.
At this point, MOSCA saves all info to output/Annotation/uniprotinfo.tsv
, so for each job it will not repeat IDs. But between jobs it might
Dear I quasere, Thank you for your elaborations that giving me so many details. It seems better after i updated the UNIMAPI version. After that, I encount another problem. Paste as follows:
[1] "Readcounts: output/Metatranscriptomics/expression_matrix.tsv" [1] "Conditions: Mt3,Mt2" [1] "Method: differential" [1] "Output: output/Metatranscriptomics" Warning message: In DESeqDataSet(se, design = design, ignoreRank) : some variables in design formula are characters, converting to factors estimating size factors estimating dispersions Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.22.
Calls: DESeq ... estimateDispersions -> .local -> checkForExperimentalReplicates Stop excute [Thu Feb 4 14:44:56 2021] Error in rule differential_expression: jobid: 0 output: output/Metatranscriptomics/gene_expression.jpeg, output/Metatranscriptomics/sample_distances.jpeg, output/Metatranscriptomics/condition_treated_results.csv
RuleException: CalledProcessError in line 246 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/../../../bin/Rscript /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/de_analysis.R --readcounts output/Metatranscriptomics/expression_matrix.tsv --conditions Mt3,Mt2 --output output/Metatranscriptomics' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 246, in rule_differential_expression File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-04T144448.543478.snakemake.log
The content of experiments.tsv file as follows cat output/experiments.tsv Files Sample Data type Condition Name /media/zyshen/MOSCA/20201023_L_QMK/mg_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mg_R2.fastq Sample dna MG mgname /media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name /media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name
best, zhiyong
For the new version of MOSCA (v1.3.1), my testing also encounter the following error:
usage: binning.py [-h] -c CONTIGS [-t THREADS] [-o OUTPUT] [-mset {40,107}] [-s SAMPLE] [-r READS] binning.py: error: argument -mset/--markerset: invalid choice: '30' (choose from '40', '107') [Thu Feb 4 10:37:48 2021] Error in rule binning: jobid: 0 output: output/Binning/Sample/checkm.tsv
RuleException: CalledProcessError in line 162 of /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/binning.py -c output/Assembly/Sample/contigs.fasta -t 14 -o output/Binning/Sample -r output/Preprocess/Sample_forward.fastq,output/Preprocess/Sample_reverse.fastq -mset 30' returned non-zero exit status 2. File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile", line 162, in rule_binning File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message output/Annotation/uniprotinfo.tsv not found or empty. Will perform mapping for all IDs. IDs present in uniprotinfo file: 0 IDs missing: 174308 Information already gathered for 0 ids. Still missing for 174308. Retrieving UniProt information from 174308 IDs. 100% |#####################################################################################################################################################################################################| Failed to retrieve information for some IDs. Retrying request. Information already gathered for 344530 ids. Still missing for 2043. Retrieving UniProt information from 2043 IDs. 100% |#####################################################################################################################################################################################################| Results for all IDs are available at output/Annotation/uniprotinfo.tsv [Thu Feb 4 14:23:56 2021] Finished job 2. 1 of 6 steps (17%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.3.1/.snakemake/log/2021-02-04T103747.776804.snakemake.log
zhiyong
For the binning error, seems like you set 30
for the markerset
option. This can only be set to either 40
or 107
- MOSGUITO only allows setting those two values, but after having the config file that limitation is not clear. 40
markerset contains marker genes that are common to both Archaea and Bacteria, while 107
marker genes are specific for Bacteria. Therefore, if you are only interested in Bacteria, 107 will be better than 40, otherwise use 40.
For the DESeq2 analysis, turns out it needs replicates for the statistical analysis. One workaround for this is to specify two times the lines you have on your experiments.tsv
. Likely, you won't need to repeat any preprocessing if you change it to this:
Files Sample Data type Condition Name
/media/zyshen/MOSCA/20201023_L_QMK/mg_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mg_R2.fastq Sample dna MG mgname
/media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name
/media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name
/media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name
/media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name
I am going to test this locally with my datasets, as I don't know if the absence of variance between the datasets will inviabilize some of the statistics. You can try on your end as well. One solution for this problem in the future is to generate myself a heatmap in R, comparing just the log transformed values of the two samples.
Hi iquasere, It running now for the MOSCA-1.3.1 version test after change the value to 107. For the former MOSCA-1.2.2 version test. I already change the experiments.tsv as you shown above.
The following objects are masked from ‘package:Biobase’:
anyMissing, rowMedians
loading:‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following objects are masked from ‘package:base’:
aperm, apply, rowsum
[1] "Readcounts: output/Metatranscriptomics/expression_matrix.tsv"
[1] "Conditions: Mt3,Mt3,Mt2,Mt2"
[1] "Method: differential"
[1] "Output: output/Metatranscriptomics"
Error in .rowNamesDF<-
(x, value = value) : 'row.names' length does not match!
Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
Stop execute
[Thu Feb 4 19:51:25 2021]
Error in rule differential_expression:
jobid: 0
output: output/Metatranscriptomics/gene_expression.jpeg, output/Metatranscriptomics/sample_distances.jpeg, output/Metatranscriptomics/condition_treated_results.csv
RuleException: CalledProcessError in line 246 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/../../../bin/Rscript /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/de_analysis.R --readcounts output/Metatranscriptomics/expression_matrix.tsv --conditions Mt3,Mt3,Mt2,Mt2 --output output/Metatranscriptomics' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 246, in rule_differential_expression File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-04T195118.815607.snakemake.log (mosca-1.2.2) (19:51 zyshen@gpuserver MOSCA-1.2.2) > cat output/experiments.tsv Files Sample Data type Condition Name /media/zyshen/MOSCA/20201023_L_QMK/mg_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mg_R2.fastq Sample dna MG mgname /media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name /media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq Sample mrna Mt3 mt3name /media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name /media/zyshen/MOSCA/20201023_L_QMK/mt2_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt2_R2.fastq Sample mrna Mt2 mt2name
any suggestion? thanks
zhiyong
I tested with my datasets, replicating the columns, and got
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
all gene-wise dispersion estimates are within 2 orders of magnitude
from the minimum value, and so the standard curve fitting techniques will not work.
One can instead use the gene-wise estimates as final estimates:
dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
...then continue with testing using nbinomWaldTest or nbinomLRT
Calls: DESeq ... estimateDispersions -> .local -> estimateDispersionsFit
Execution halted
so likely DESeq2 will always require true replicates to run the differential expression (which makes sense). So you can end your analysis here, I can only give instructions on how to hack MOSCA in order to finish it without differential expression.
At /media/zyshen/miniconda3/envs/mosca-1.2.2/share/MOSCA/scripts/Snakefile
, comment line 345, so it becomes
#expand("{output}/Metatranscriptomics/condition_treated_results.csv", output = config["output"])
At /media/zyshen/miniconda3/envs/mosca-1.2.2/share/MOSCA/scripts/report.py
, comment line 238, so it becomes
#self.info_from_differential_expression(args.output, sample)
Of course this isn't usual, and likely MOSCA will have to be adapted to accept requests without replicates. But for now, it's the quickest way you can get your work going again.
Thanks iquasere, It finally done success after comment the specify line!
(mosca-1.2.2) (12:28 zyshen@gpuserver MOSCA-1.2.2) > python workflow/mosca.py -c config.json Building DAG of jobs... Using shell: /bin/bash Provided cores: 96 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 report 2
[Fri Feb 5 12:28:42 2021] rule report: input: output/MOSCA_Protein_Report.xlsx output: output/technical_report.tsv, output/MOSCA_General_Report.xlsx, output/MOSCA_results.zip jobid: 12
Job counts: count jobs 1 report 1 python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/report.py -e output/experiments.tsv -o output -ldir /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/../resources -if tsv conda list Initializing Report Retrieving preprocessing information for dataset: mgname Retrieving preprocessing information for dataset: mt3name Retrieving preprocessing information for dataset: mt3name Retrieving preprocessing information for dataset: mt2name Retrieving preprocessing information for dataset: mt2name
Retrieving assembly information for sample Sample Retrieving annotation information for sample Sample cat output/Metatranscriptomics/mt3name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' cat output/Metatranscriptomics/mt3name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' cat output/Metatranscriptomics/mt2name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' cat output/Metatranscriptomics/mt2name.sam | cut -f 3 | sort | uniq -c | awk '{printf("%s\t%s\n", $2, $1)}' | awk '{sum+=$2} END {print sum}' [Fri Feb 5 13:17:23 2021] Finished job 12. 1 of 2 steps (50%) done
[Fri Feb 5 13:17:23 2021] localrule all: input: output/Binning/Sample/checkm.tsv, output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/technical_report.tsv, output/MOSCA_General_Report.xlsx, output/MOSCA_results.zip jobid: 0
[Fri Feb 5 13:17:23 2021] Finished job 0. 2 of 2 steps (100%) done Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-05T122842.016404.snakemake.log (mosca-1.2.2) (13:17 zyshen@gpuserver MOSCA-1.2.2) >
Oh man, happy to hear that ahah. This means MOSCA is fully functional at this point, however it must take as input at least two different conditions with at least duplicates. I tested it with two real conditions of duplicates and three simulated conditions of triplicates. There are these little nuances that I must work upon. As the tool grows so will the different needs, and if the users are as helpful as you were, MOSCA will grow with them!
At this point, only KEGGCharter is missing. If you still want to run it, this command installs the correct version conda install -c conda-forge -c bioconda keggcharter=0.1.3
and this runs it for your datasets
kegg_charter.py -f output/MOSCA_Entry_Report.xlsx -gcol MG -tcol Mt3,Mt2 -keggc "Cross-reference (KEGG)" -o output/KEGGCharter_results -tc "Taxonomic lineage (GENUS)"
Thank you very much for your patience. Hope it was worth it ^^
Thanks iquasere, I will use the true dulicates data to run the whole thing and hope we don't need to comment any line and obtain the differential expression results. I wondering how DESeq2 can recognize my data is not a true replicates if the different condition still can obtain the same gene expression value :) I check my test data, it really has different value in some rows and can't understand why the DESeq2 didn't work. :)
If we just copy the columns, we will get
Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
all gene-wise dispersion estimates are within 2 orders of magnitude
from the minimum value, and so the standard curve fitting techniques will not work.
One can instead use the gene-wise estimates as final estimates:
dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
...then continue with testing using nbinomWaldTest or nbinomLRT
Calls: DESeq ... estimateDispersions -> .local -> estimateDispersionsFit
Execution halted
so some problems may happen in the future if some datasets have no significant differential expression. In that case, the DE package of MOSCA may require a variance check, which if lower than this threshold, might have to undergo a different statistical analysis...
Hi iquasere,
Sorry for trouble you again, This time I try to test my own true MT and MG data together.
and encounter such error as follows, Is this mean my fastq data has some problem?
I check that none of the reads is shorter than 18 nucleotides in my raw data.
In the version of MOSCA1.2.2 it still keep running but MOSCA1.3.1 will exit.
............
WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched
WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched
WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched
bash /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/unmerge-paired-reads.sh output/Preprocess/SortMeRNA/mt1C_interleaved.fastq output/Preprocess/SortMeRNA/mt1C_forward.fastq output/Preprocess/SortMeRNA/mt1C_reverse.fastq
Processing output/Preprocess/SortMeRNA/mt1C_forward.fastq ..
Processing output/Preprocess/SortMeRNA/mt1C_reverse.fastq ..
Done.
Removed: output/Preprocess/SortMeRNA/mt1C_interleaved.fastq
fastqc --outdir output/Preprocess/FastQC --threads 14 --extract output/Preprocess/SortMeRNA/mt1C_forward.fastq output/Preprocess/SortMeRNA/mt1C_reverse.fastq
Started analysis of mt1C_forward.fastq
Started analysis of mt1C_reverse.fastq
Approx 5% complete for mt1C_forward.fastq
Approx 5% complete for mt1C_reverse.fastq
Approx 10% complete for mt1C_forward.fastq
Approx 10% complete for mt1C_reverse.fastq
Approx 15% complete for mt1C_forward.fastq
Approx 15% complete for mt1C_reverse.fastq
Approx 20% complete for mt1C_forward.fastq
Approx 20% complete for mt1C_reverse.fastq
Approx 25% complete for mt1C_forward.fastq
Approx 25% complete for mt1C_reverse.fastq
Approx 30% complete for mt1C_forward.fastq
Approx 30% complete for mt1C_reverse.fastq
Approx 35% complete for mt1C_forward.fastq
Approx 35% complete for mt1C_reverse.fastq
Approx 40% complete for mt1C_forward.fastq
Approx 40% complete for mt1C_reverse.fastq
Approx 45% complete for mt1C_forward.fastq
Approx 45% complete for mt1C_reverse.fastq
Approx 50% complete for mt1C_forward.fastq
Approx 50% complete for mt1C_reverse.fastq
Approx 55% complete for mt1C_forward.fastq
Approx 55% complete for mt1C_reverse.fastq
Approx 60% complete for mt1C_forward.fastq
Approx 60% complete for mt1C_reverse.fastq
Approx 65% complete for mt1C_forward.fastq
Approx 65% complete for mt1C_reverse.fastq
Approx 70% complete for mt1C_forward.fastq
Approx 70% complete for mt1C_reverse.fastq
Approx 75% complete for mt1C_forward.fastq
Approx 75% complete for mt1C_reverse.fastq
Approx 80% complete for mt1C_forward.fastq
Approx 80% complete for mt1C_reverse.fastq
Approx 85% complete for mt1C_forward.fastq
Approx 85% complete for mt1C_reverse.fastq
Approx 90% complete for mt1C_forward.fastq
Approx 90% complete for mt1C_reverse.fastq
Approx 95% complete for mt1C_forward.fastq
Approx 95% complete for mt1C_reverse.fastq
Failed to process file mt1C_forward.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
at java.base/java.lang.Thread.run(Thread.java:834)
Failed to process file mt1C_reverse.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77)
at java.base/java.lang.Thread.run(Thread.java:834)
Traceback (most recent call last):
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 376, in
RuleException: CalledProcessError in line 79 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py -i /media/zyshen/MOSCA/20201023_L_QMK/mt1C_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt1C_R2.fastq -t 14 -o output/Preprocess -adaptdir /media/zyshen/work/MOSCA/MOSCA-1.2.2/adapters -rrnadbs /media/zyshen/work/MOSCA/MOSCA-1.2.2/rRNA_databases -d mrna -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 -n mt1C --minlen 100 --avgqual 20' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 79, in rule_preprocess File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message 0:13:19.610 4G / 4G INFO K-mer Counting (kmer_data.cpp : 321) Processed 21034020 reads 0:13:19.616 4G / 4G INFO K-mer Counting (kmer_data.cpp : 326) Total 21034020 reads processed 0:13:19.617 4G / 4G INFO K-mer Index Building (kmer_index_builder.hpp : 301) Building kmer index 0:13:19.617 4G / 4G INFO General (kmer_index_builder.hpp : 117) Splitting kmer instances into 22
I check that none of the reads is shorter than 18 nucleotides in my raw data.
This WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched
comes from SortMeRNA, and is normal, happens to me everytime I have adapters. In the future I will likely remove those reads with the MINLEN
tool of Trimmomatic.
ID line didn't start with '@'
Now this is very weird. Can you please tell me what are the outputs of these commands?
wc -l output/Preprocess/SortMeRNA/mt1C_forward.fastq
grep '@' -c output/Preprocess/SortMeRNA/mt1C_forward.fastq
wc -l output/Preprocess/SortMeRNA/mt1C_reverse.fastq
grep '@' -c output/Preprocess/SortMeRNA/mt1C_reverse.fastq
217089 queries aligned. The host system is detected to have 1081 GB of RAM. It is recommended to increase the block size for better performance using these parameters : -b12 -c1 [Mon Feb 8 17:24:12 2021] Finished job 7. 2 of 10 steps (20%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-08T160825.230213.snakemake.log (mosca-1.2.2) (17:24 zyshen@gpuserver MOSCA-1.2.2) > wc -l output/Preprocess/SortMeRNA/mt1C_forward.fastq 12168464 output/Preprocess/SortMeRNA/mt1C_forward.fastq (mosca-1.2.2) (11:20 zyshen@gpuserver MOSCA-1.2.2) > grep '@' -c output/Preprocess/SortMeRNA/mt1C_forward.fastq 3038087 (mosca-1.2.2) (11:20 zyshen@gpuserver MOSCA-1.2.2) > wc -l output/Preprocess/SortMeRNA/mt1C_reverse.fastq 12168460 output/Preprocess/SortMeRNA/mt1C_reverse.fastq (mosca-1.2.2) (11:21 zyshen@gpuserver MOSCA-1.2.2) > grep '@' -c output/Preprocess/SortMeRNA/mt1C_reverse.fastq 3038085
Ok, you are using the phred score that employs @
. Instead, please run
grep '^@' -c output/Preprocess/SortMeRNA/mt1C_forward.fastq
grep '^@' -c output/Preprocess/SortMeRNA/mt1C_reverse.fastq
so it only counts at beggining of line. But I think I know the problem here, you have an orphan read. MOSCA handled this in the past, but I though SortMeRNA had fixed it... what you need is to go edit /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py
and in lines 227-233, change to
for fr in ['forward', 'reverse']:
self.remove_messed_reads('{}_{}.fastq'.format(basename, fr))
self.remove_orphans(basename + '_forward.fastq',
basename + '_reverse.fastq')
basicaly remove the plicas and fix indentation, and next version I am going to reimplement this in MOSCA. Such a shame, this step takes a long time to just remove a single read, but it fixes it...
hi iquasere, Thanks, I edit the script as you suggested and encounter another new error.
.......... WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched
WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched
WARNING: At least one of the reads is shorter than 18 nucleotides, by default it will not be searched
bash /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/unmerge-paired-reads.sh output/Preprocess/SortMeRNA/mt1C_interleaved.fastq output/Preprocess/SortMeRNA/mt1C_forward.fastq output/Preprocess/SortMeRNA/mt1C_reverse.fastq
Processing output/Preprocess/SortMeRNA/mt1C_forward.fastq ..
Processing output/Preprocess/SortMeRNA/mt1C_reverse.fastq ..
Done.
Traceback (most recent call last):
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py", line 376, in
RuleException: CalledProcessError in line 79 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/preprocess.py -i /media/zyshen/MOSCA/20201023_L_QMK/mt1C_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt1C_R2.fastq -t 14 -o output/Preprocess -adaptdir /media/zyshen/work/MOSCA/MOSCA-1.2.2/adapters -rrnadbs /media/zyshen/work/MOSCA/MOSCA-1.2.2/rRNA_databases -d mrna -rd /media/zyshen/work/MOSCA/MOSCA-1.2.2 -n mt1C --minlen 100 --avgqual 20' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 79, in rule_preprocess File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message
best regards, zhiyong
Reimplemented this properly in 1.3.2. Tested and functional. You'll have to wait a little bit before running conda install -c conda-forge -c bioconda -c anaconda -y mosca=1.3.2
, as it takes some time for it to be available through Bioconda
Thanks iquasere, I download your new verison of MOSCA and run my data again. and paster the new error report as follows.
you also can download my data from the following address to help me run your pipeline again. http://143.89.25.148/labcloud/ account: guest passwd: guest I put all the six gz files on the desktop.
[Fri Feb 12 18:35:20 2021] rule join_information: input: output/Annotation/uniprotinfo.tsv, output/Annotation/Sample/aligned.blast, output/Annotation/Sample/reCOGnizer_results.xlsx, output/Metatranscriptomics/mt3A.readcounts, output/Metatranscriptomics/mt1C.readcounts, output/Annotation/mgname.readcounts output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv jobid: 1 threads: 22
Job counts:
count jobs
1 join_information
1
python /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/join_information.py -e output/experiments.tsv -t 22 -o output -if tsv -nm TMM
sys:1: DtypeWarning: Columns (19) have mixed types.Specify dtype option on import or set low_memory=False.
2021-02-12 10:35:22: Joining data for sample: Sample
head -n -5 output/Annotation/mgname.readcounts
seqkit fx2tab output/Assembly/Sample/contigs.fasta | sort | awk '{print $1"\t"length($2)}' | join - output/Annotation/mgname_no_tail.readcounts | awk '{print $1"\t"$3/$2}'
Finding consensus COG for each Entry of Sample: Sample
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 174314/174314 [02:45<00:00, 1055.10it/s]
Traceback (most recent call last):
File "/media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/join_information.py", line 147, in
RuleException: CalledProcessError in line 282 of /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/join_information.py -e output/experiments.tsv -t 22 -o output -if tsv -nm TMM' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.3.1/workflow/Snakefile", line 282, in rule_join_information File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.3.1/.snakemake/log/2021-02-12T183520.570545.snakemake.log
Happy lunar new year ! best, zhiyong
I'm going to tackle this now, maybe at the end of the day will have some news!
I am downloading your files now, will inform when it is over!
Hope you had a good holiday! And have a great lunar year! ^^
Hi iquasere, Did you run my data success and what's the problem of my data? Why they will generate such error report when running the pipeline of MOSCA. BTW, i use 1.3.3 version now and still stop at the same place and same error (KeyError: "Columns not found: 'mt1C', 'mt3A'") Thanks!
Zhiyong
Hi there! I had to postpone this problem, because am now in a tight deadline to deal with a paper related to other tools. I am aiming at finishing it this weekend, and this question will be the next thing to solve. So likely I will have news next week, but not before that.
I did download the datasets, so you can close the access to your server. Let me see if in the next days I can replicate this annoying bug...
Can you please share your config and experiments file as well? You can share those directly here
Ok, this problem is reproducible in my end. It seems the final reporter script still does not consider the names inputted. I'm gonna send a new release likely today.
About running MOSCA with no replicates, I think differential expression is a vital part of the workflow, and it would be more hassle than useful to shape it for every want. Therefore, I am working on a page in MOSCA's wiki to address some helpful runs of only parts of MOSCA, and this is going to be one of the alternative workflows.
HI iquasere, Thanks for your help! I recently running more of my raw data, enclosed pls find my config and experiments files. .........................
Analysis complete for mt3A_reverse.fastq join output/Preprocess/SortMeRNA/read1.txt output/Preprocess/SortMeRNA/read2.txt | awk '{print $1" "$2"\n"$3"\n+\n"$4
"output/Preprocess/SortMeRNA/mt1A_forward.fastq";print $1" "$5"\n"$6"\n+\n"$7 > "output/Preprocess/SortMeRNA/mt1A_reverse.fastq"}' join: output/Preprocess/SortMeRNA/read1.txt: No such file or directory Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 379, in
Preprocesser().run() File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 349, in run original_files=True if args.input == original_input else False) File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 226, in rrna_removal '{}/{}_reverse.fastq'.format(out_dir, name), out_dir) File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py", line 191, in remove_orphans os.remove(file) FileNotFoundError: [Errno 2] No such file or directory: 'output/Preprocess/SortMeRNA/read1.txt' [Mon Mar 1 14:06:09 2021] Error in rule preprocess: jobid: 0 output: output/Preprocess/Trimmomatic/quality_trimmed_mt1A_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mt1A_reverse_paired.fq
RuleException: CalledProcessError in line 111 of /media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/preprocess.py -i /media/zyshen/MOSCA/20201023_L_QMK/mt1A_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt1A_R2.fastq -t 6 -o output/Preprocess -adaptdir /media/zyshen/work/MOSCA/MOSCA-1.3.4/adapters -rrnadbs /media/zyshen/work/MOSCA/MOSCA-1.3.4/rRNA_databases -d mrna -rd /media/zyshen/work/MOSCA/MOSCA-1.3.4 -n mt1A --minlen 100 --avgqual 20' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.3.4/workflow/Snakefile", line 111, in rule_preprocess File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message ........... Analysis complete for quality_trimmed_mt1C_forward_paired.fq Analysis complete for quality_trimmed_mt1C_reverse_paired.fq [Mon Mar 1 14:25:04 2021] Finished job 13. 6 of 17 steps (35%) done
The pipeline can't go to the next step and still running the following task for a very long time!
48713 zyshen 20 0 2092084 1.275g 4848 R 599.0 0.1 3975:35 /media/zyshen/miniconda3/envs/snakemake/bin/spades-hammer /media/zyshen/work/MOSCA/MOSCA-1.3.4/output/Assembly/Sample/corrected/confi+
any suggestion?
best wishes, zhiyong
On Mon, Mar 1, 2021 at 7:28 PM João Sequeira notifications@github.com wrote:
Ok, this problem is reproducible in my end. It seems the final reporter script still does not consider the names inputted. I'm gonna send a new release likely today.
About running MOSCA with no replicates, I think differential expression is a vital part of the workflow, and it would be more hassle than useful to shape it for every want. Therefore, I am working on a page https://github.com/iquasere/MOSCA/wiki/Partial-runs in MOSCA's wiki to address some helpful runs of only parts of MOSCA, and this is going to be one of the alternative workflows.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/iquasere/MOSCA/issues/12#issuecomment-787876477, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZMJF26I34HE425VXGRBJTTBN27HANCNFSM4W5YVKVA .
-- Sincerely yours, Zhiyong Shen Cell phone: +852-56242611 Email: szypanther@gmail.com
Hi iquasere, Sorry to trouble you again, Here I still encounter another new problem after running almost 2 days later. .... 5500000 alignment record pairs processed. 5600000 alignment record pairs processed. 5700000 alignment record pairs processed. 5800000 alignment record pairs processed. 5834173 alignment pairs processed. [Tue Feb 2 15:08:24 2021] Finished job 9. 1 of 5 steps (20%) done
[Tue Feb 2 15:08:24 2021] rule join_information: input: output/Annotation/uniprotinfo.tsv, output/Annotation/Sample/aligned.blast, output/Annotation/Sample/reCOGnizer_results.xlsx, output/Metatranscriptomics/mtname.readcounts, output/Annotation/mgname.readcounts output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv jobid: 5 threads: 12
Job counts: count jobs 1 join_information 1 python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py -e output/experiments.tsv -t 12 -o output -if tsv -nm TMM 2021-02-02 07:08:28: Joining data for sample: Sample Traceback (most recent call last): File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py", line 147, in
Joiner().run()
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py", line 58, in run
sheet_names = pd.ExcelFile(recognizer_filename).sheet_names
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 867, in init
self._reader = self._enginesengine
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_xlrd.py", line 22, in init
super().init(filepath_or_buffer)
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 353, in init
self.book = self.load_workbook(filepath_or_buffer)
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/init.py", line 170, in open_workbook
raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported
[Tue Feb 2 15:08:28 2021]
Error in rule join_information:
jobid: 0
output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv
RuleException: CalledProcessError in line 232 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile: Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py -e output/experiments.tsv -t 12 -o output -if tsv -nm TMM' returned non-zero exit status 1. File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 232, in rule_join_information File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init__.py", line 568, in _callback File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-02T143849.506822.snakemake.log
you can see the size of mtname_bowtie2_report.txt is zero. -rw-rw-r-- 1 zyshen zyshen 630 2月 2 08:27 mtname.log -rw-rw-r-- 1 zyshen zyshen 13M 2月 2 08:36 mtname.readcounts -rw-rw-r-- 1 zyshen zyshen 4.5G 2月 2 08:27 mtname.sam -rw-rw-r-- 1 zyshen zyshen 0 2月 2 08:26 mtname_bowtie2_report.txt
any suggestion? Thanks!