vanheeringen-lab / seq2science

Automated and customizable preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows. Works equally easy with public as local data.
https://vanheeringen-lab.github.io/seq2science
MIT License
155 stars 27 forks source link

BUG: Error when running rna-seq if dataset contains multiple conditions with biological replicates #644

Closed tbret closed 3 years ago

tbret commented 3 years ago

Describe the bug When running seq2science rna-seq on a dataset consisting of two conditions each with two biological replicates, towards the end I get an error. The samples.tsv file contains the columns sample, assembly and descriptive_name. When I split up the dataset and ran seq2science separately per condition, it ran without any error.

Error in rule blind_clustering:
    jobid: 28
    output: /scratch/thirsa/mouse_data/rna/GSE65322/results_GSE65322/qc/clustering/GRCm38.p6-Sample_clustering_mqc.png
    log: /scratch/thirsa/mouse_data/rna/GSE65322/results_GSE65322/log/deseq2/GRCm38.p6-clustering.log (check log file(s) for error message)
    conda-env: /mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/16132cbc

RuleException:
CalledProcessError in line 92 of /mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/DGE_analysis.smk:
Command 'source /mbshome/tbrethouwer/miniconda3/envs/seq2science/bin/activate '/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/16132cbc'; set -euo pipefail;  Rscript --vanilla /scratch/thirsa/mouse_data/rna/GSE65322/.snakemake/scripts/tmpfq41gllh.deseq2_clustering.R' returned non-zero exit status 1.
  File "/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2114, in run_wrapper
  File "/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/DGE_analysis.smk", line 92, in __rule_blind_clustering
  File "/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper

The log file gives this as error:

Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/mbshome/tbrethouwer/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/16132cbc/lib/R/library/pdftools/libs/pdftools.so':
  libiconv.so.2: cannot open shared object file: No such file or directory
Calls: :: ... getNamespace -> loadNamespace -> library.dynam -> dyn.load
Execution halted

And this is the config file:

# tab-separated file of the samples
samples: samples_GSE78761.tsv

# pipeline file locations
result_dir: ./results_GSE78761  # where to store results
genome_dir: /scratch/thirsa/mouse_data/genomes/  # where to look for or download the genomes
# fastq_dir: ./results/fastq  # where to look for or download the fastqs

# contact info for multiqc report and trackhub
#email:

# produce a UCSC trackhub?
create_trackhub: True

#which provider to use
provider: Ensembl

# how to handle replicates
technical_replicates: keep    # change to "keep" to not combine them

# which trimmer to use
trimmer: fastp

# which quantifier to use
quantifier: htseq  # or salmon or featurecounts

##### aligner and filter options are not used for the gene counts matrix if the quantifier is Salmon

# which aligner to use
aligner: hisat2

# filtering after alignment
remove_blacklist: True
min_mapping_quality: 30  # (only keep uniquely mapped reads from STAR alignments)
only_primary_align: True

##### differential gene expression analysis (optional) #####

#deseq2:
#  multiple_testing_procedure: BH
#  alpha_value: 0.1
#  shrinkage_estimator: apeglm

#contrasts:
#  - 'stage_2_1'
#  - 'stage_all_1'
Maarten-vd-Sande commented 3 years ago

Thanks for making the issue :smile:

samples.tsv

sample | assembly | descriptive_name
-- | -- | --
GSM1594065 | GRCm38.p6 | ISC_WT_rep1
GSM1594066 | GRCm38.p6 | ISC_WT_rep2
GSM1594067 | GRCm38.p6 | ISC_RingKO_rep1
GSM1594068 | GRCm38.p6 | ISC_RingKO_rep2
GSM1847817 | GRCm38.p6 | villi_WT
GSM1847818 | GRCm38.p6 | villi_RingKO