nanoporetech / pipeline-transcriptome-de

Pipeline for differential gene expression (DGE) and differential transcript usage (DTU) analysis using long reads
Other
106 stars 26 forks source link

Rule exception when running pipeline-transcriptome-de/scripts/R #5

Closed callumparr closed 5 years ago

callumparr commented 5 years ago

Hey @bsipos sorry about posting in wrong repository yesterday. I still had problems writing out such large xlsx file, so instead I switched to run my datas through the pipeline without the context of the tutorial Rmarkdown.

But I ran into issue using the R scripts at the end.

`Error in rule de_analysis: jobid: 10 output: de_analysis/results_dge.tsv, de_analysis/results_dge.pdf, de_analysis/results_dtu_gene.tsv, de_analysis/results_dtu_transcript.tsv, de_analysis/results_dtu_stageR.tsv, merged/all_counts_filtered.tsv, merged/all_gene_counts.tsv conda-env: /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2

RuleException: CalledProcessError in line 128 of /Users/callum/pipeline-transcriptome-de/Snakefile: Command 'source activate /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2; set -euo pipefail;
/Users/callum/pipeline-transcriptome-de/scripts/de_analysis.R ' returned non-zero exit status 1. File "/Users/callum/pipeline-transcriptome-de/Snakefile", line 128, in __rule_de_analysis File "/Users/callum/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run`

Do I need to do anything else other than install via conda snakemake and pandas, then run snakemake?

bsipos commented 5 years ago

Hi,

Could you please activate the conda environment created by snakemake and try to run the de_analysis.R script manually. I am curious what the output would be in that case.

Botond

callumparr commented 5 years ago

Hi,

Could you please activate the conda environment created by snakemake and try to run the de_analysis.R script manually. I am curious what the output would be in that case.

Botond

Sorry after activating how may I run the R script manually, you mean opening in Rstudio?

Also after listing environments I can find the .snakemake generated env but it doesn't have a name associated with it like others

base                  *  /Users/callum/miniconda3
BasicQC                  /Users/callum/miniconda3/envs/BasicQC
Pinfish                  /Users/callum/miniconda3/envs/Pinfish
cDNA_DESeq2              /Users/callum/miniconda3/envs/cDNA_DESeq2
transcriptome_tutorial     /Users/callum/miniconda3/envs/transcriptome_tutorial
                         /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2
bsipos commented 5 years ago

To activate the env just do "'source activate /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2".

Then run from shell directly "/Users/callum/pipeline-transcriptome-de/scripts/de_analysis.R".

Botond

callumparr commented 5 years ago

I tried rm -rvf the git repository, and downloaded again but still got same problem. So I did as you suggested. I edited the config file, I just used the synthetic downsampled libraries that were included in the Transcriptome tutorial, just so I could get through it quicker.

I activate source and run the de_analysis.R script and it says it cannot find the merged/all_counts.tsv

(base) tm1612s-MacBook-Pro:pipeline-transcriptome-de callum$ source activate /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2
(/Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2) tm1612s-MacBook-Pro:pipeline-transcriptome-de callum$ /Users/callum/pipeline-transcriptome-de/scripts/de_analysis.R 
Loading counts, conditions and parameters.
Error in file(file, "rt") : cannot open the connection
Calls: as.matrix -> read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'merged/all_counts.tsv': No such file or directory
Execution halted

I checked for the file and it seems to be there, maybe this is some permission issue of the create file?

I check and it seems to be ok.

Reference       IR3     C1      IR1     C2      C3      IR2
1       9278.0  7732.0  9336.0  7726.0  7699.0  9141.0
9       218.0   271.0   225.0   288.0   279.0   234.0
14      180.0   269.0   195.0   285.0   274.0   196.0
15      85.0    138.0   93.0    135.0   138.0   93.0
7       84.0    98.0    84.0    112.0   105.0   96.0
17      67.0    66.0    70.0    70.0    69.0    72.0
2       66.0    63.0    68.0    71.0    79.0    65.0
X       59.0    67.0    68.0    61.0    62.0    59.0
10      58.0    40.0    54.0    43.0    43.0    65.0
3       50.0    65.0    56.0    66.0    58.0    50.0
5       48.0    53.0    47.0    52.0    45.0    54.0
12      26.0    50.0    27.0    48.0    47.0    20.0
22      26.0    15.0    18.0    14.0    19.0    21.0
11      26.0    39.0    18.0    36.0    39.0    23.0
8       25.0    27.0    18.0    27.0    23.0    18.0
4       21.0    16.0    19.0    14.0    14.0    21.0
16      19.0    16.0    26.0    15.0    13.0    23.0
18      12.0    3.0     12.0    8.0     7.0     10.0
6       12.0    26.0    22.0    21.0    25.0    14.0
13      7.0     14.0    5.0     10.0    13.0    4.0
MT      5.0     0.0     4.0     0.0     0.0     3.0
19      4.0     7.0     6.0     5.0     4.0     3.0
KI270727.1      3.0     1.0     4.0     2.0     1.0     1.0
21      1.0     1.0     1.0     1.0     0.0     0.0
20      1.0     4.0     1.0     3.0     3.0     3.0
Y       0.0     3.0     0.0     3.0     3.0     0.0
bsipos commented 5 years ago

Yeah, you have to run it from the snakemake working directory. But the other problem seems to be that you are using the genome reference rather than the transcriptome reference for mapping!

Botond

callumparr commented 5 years ago

Ah sorry yep this is true, I was working on two different systems and trying different things and must have copied in wrong reference data. I changed config.file and reran, actually clearing the Workspaces directory and running snakemake, it got further but then still halted.

Activating conda environment: /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2
Loading counts, conditions and parameters.
Loading annotation database.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(type, phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
'select()' returned 1:many mapping between keys and columns
Filtering counts using DRIMSeq.
Building model matrix.
Sum transcript counts into gene counts.
Warning message:
funs() is soft deprecated as of dplyr 0.8.0
please use list() instead

# Before:
funs(name = f(.)

# After: 
list(name = ~f(.))
This warning is displayed once per session. 
Running differential gene expression analysis using edgeR.
Running differential transcript usage analysis using DEXSeq.
converting counts to integer mode
null device 
          1 
Installing stageR.
Downloading GitHub repo statOmics/stageR@master
sh: /bin/tar: No such file or directory
sh: /bin/tar: No such file or directory
Error in length(file_list) > 0 : error in running command
Calls: install_github ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
In addition: Warning messages:
1: In system(paste(TAR, "--version >", tf, "2>&1")) :
  error in running command
2: In system(cmd) : error in running command
3: In utils::untar(tarfile, ...) :
  ‘/usr/bin/gzip -dc '/var/folders/33/1qxhvnh56wl6g7qr6lm9nwxm0000gp/T//RtmpmcmKSu/file5323497d0fe.tar.gz' | /bin/tar -xf '-' -C '/var/folders/33/1qxhvnh56wl6g7qr6lm9nwxm0000gp/T//RtmpmcmKSu/remotes5323b19d60d'’ returned error code 127
4: In system(paste(TAR, "--version >", tf, "2>&1")) :
  error in running command
Execution halted
[Thu Mar  7 23:59:48 2019]
Error in rule de_analysis:
    jobid: 10
    output: de_analysis/results_dge.tsv, de_analysis/results_dge.pdf, de_analysis/results_dtu_gene.tsv, de_analysis/results_dtu_transcript.tsv, de_analysis/results_dtu_stageR.tsv, merged/all_counts_filtered.tsv, merged/all_gene_counts.tsv
    conda-env: /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2

RuleException:
CalledProcessError in line 128 of /Users/callum/pipeline-transcriptome-de/Snakefile:
Command 'source activate '/Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2'; set -euo pipefail;  /Users/callum/pipeline-transcriptome-de/scripts/de_analysis.R' returned non-zero exit status 1.
  File "/Users/callum/pipeline-transcriptome-de/Snakefile", line 128, in __rule_de_analysis
  File "/Users/callum/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job de_analysis since they might be corrupted:
de_analysis/results_dge.tsv, de_analysis/results_dge.pdf, de_analysis/results_dtu_gene.tsv, de_analysis/results_dtu_transcript.tsv, merged/all_counts_filtered.tsv, merged/all_gene_counts.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /Users/callum/pipeline-transcriptome-de/.snakemake/log/2019-03-07T235124.804046.snakemake.log

If activate conda env, then cd to Workspace/pipeline-transcriptome-de_phe and run script get same error

(base) tm1612s-MacBook-Pro:pipeline-transcriptome-de callum$ source activate /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2
(/Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2) tm1612s-MacBook-Pro:pipeline-transcriptome-de callum$ cd Workspaces/pipeline-transcriptome-de_phe/
(/Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2) tm1612s-MacBook-Pro:pipeline-transcriptome-de_phe callum$ /Users/callum/pipeline-transcriptome-de/scripts/de_analysis.R 
Loading counts, conditions and parameters.
Loading annotation database.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(type, phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
'select()' returned 1:many mapping between keys and columns
Filtering counts using DRIMSeq.
Building model matrix.
Sum transcript counts into gene counts.
Warning message:
funs() is soft deprecated as of dplyr 0.8.0
please use list() instead

# Before:
funs(name = f(.)

# After: 
list(name = ~f(.))
This warning is displayed once per session. 
Running differential gene expression analysis using edgeR.
Running differential transcript usage analysis using DEXSeq.
converting counts to integer mode
null device 
          1 
Installing stageR.
Downloading GitHub repo statOmics/stageR@master
sh: /bin/tar: No such file or directory
sh: /bin/tar: No such file or directory
Error in length(file_list) > 0 : error in running command
Calls: install_github ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
In addition: Warning messages:
1: In system(paste(TAR, "--version >", tf, "2>&1")) :
  error in running command
2: In system(cmd) : error in running command
3: In utils::untar(tarfile, ...) :
  ‘/usr/bin/gzip -dc '/var/folders/33/1qxhvnh56wl6g7qr6lm9nwxm0000gp/T//RtmpbzUfJE/file55c934de7a5b.tar.gz' | /bin/tar -xf '-' -C '/var/folders/33/1qxhvnh56wl6g7qr6lm9nwxm0000gp/T//RtmpbzUfJE/remotes55c97a33077a'’ returned error code 127
4: In system(paste(TAR, "--version >", tf, "2>&1")) :
  error in running command
Execution halted
bsipos commented 5 years ago

It seems the issue is that /bin/tar is missing.

callumparr commented 5 years ago

Under the conda environment and within the Workspace I ran R, and install the various R packages, and saved the R image, and then ran the script again. But same error.

In this case I should manually place a downloaded tar for stageR into the /bin ?

bsipos commented 5 years ago

Yes, that could work. Alternatively replace "/bin/tar" in "script/de_analysis" with the location to your binary.

callumparr commented 5 years ago

So I downloaded the tarball from statOmics/stageR, then mv this file to the /Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2/bin

Open the de_analysis.R script in R command.

And it to this

# stageR analysis of DEXSeq results:
cat("Installing stageR.\n")
Sys.setenv(TAR = "/bin/tar")
library(devtools)
install.packages("/Users/callum/pipeline-transcriptome-de/Workspaces/pipeline-transcriptome-de_phe/.snakemake/conda/b39bc3b2/bin/stageR-1.0.tar", repos = NULL, type = "source")
library(stageR)

Go back to top most under the repository and run the snakemake and it runs on the last few jobs and runs to completion.

Are the files written to de_analysis/ folder supposed to be moved to the results directory? Or stay within the Workspaces?

bsipos commented 5 years ago

Okay, so it worked in the end. You can keep the results within Workspaces, just make sure you do not invoke the snakemake rule which cleans up the working directory.