JihedC commented 3 years ago

Hi,

I have been trying to run the chip-seq workflow of seq2science. It starts but stops when 7% of the jobs are done.

seq2science --version
seq2science: v0.5.1

To Reproduce Please include your config.yaml, your samples.tsv, and the complete/relevant output.

Both config.yaml and samples.tsv were generated from seq2science init chip-seq

config.yaml:


# tab-separated file of the samples
samples: samples.tsv

pipeline file locations

result_dir: ./results # where to store results genome_dir: ./genomes # where to look for or download the genomes

fastq_dir: ./results/fastq # where to look for or download the fastqs

contact info for multiqc report and trackhub

email: yourmail@here.com

produce a UCSC trackhub?

create_trackhub: true

how to handle replicates

biological_replicates: fisher # change to "keep" to not combine them technical_replicates: merge # change to "keep" to not combine them

which trimmer to use

trimmer: fastp

which aligner to use

aligner: bwa-mem2

filtering after alignment

remove_blacklist: true min_mapping_quality: 30 only_primary_align: true

peak caller

peak_caller: macs2: --keep-dup 1 --buffer-size 10000

differential gene expression analysis

contrasts:

- 'descriptive_name_all_HEL'


- samples.tsv :

for help with filling out the samples.tsv:

https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv

also make sure that you use tab as a delimiter

sample assembly descriptive_name GSM4404624 hg38 HEL


I get several error messages, I include the complete log file:
[seq2science.2021-04-13T103059.065792.log](https://github.com/vanheeringen-lab/seq2science/files/6302389/seq2science.2021-04-13T103059.065792.log)

The log file in `seq2science/results/log/bwa-mem2_index/hg38.log`:

Looking to launch executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx", simd = .avx Launching executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx" [bwa_index] Pack FASTA... 18.78 sec

Entering FMI_search init ticks = 204386466299 ref seq len = 6199501436 binary seq ticks = 136647971146

Those are the files I got in the genome folder:

(seq2science) jchouaref@res-hpc-exe028:/exports/humgen/jihed/seq2science/genomes/hg38$ tree
.
├── hg38.annotation.bed.gz
├── hg38.annotation.gtf.gz
├── hg38.fa
├── hg38.fa.fai
├── hg38.fa.sizes
├── hg38.gaps.bed
├── index
├── README.txt
└── tmpevip0jtt

Do you think the problem comes from there?

Maarten-vd-Sande commented 3 years ago

Is it possible there is a shell message being output? Those are not captured in stdout/stderr, and won't end up in the log but will be printed in your terminal. Segfaults and memory issues are examples of this.

Maybe setting aligner: bwa-mem instead of bwa-mem2 helps in this case? bwa-mem2 is extremely memory hungry

JihedC commented 3 years ago

The cluster was a bit busy today, I hope it will run during the night

JihedC commented 3 years ago

Hi Maarten,

I have a good news, I tried the atac-seq workflow as well yesterday and it worked just fine, I'll try it today with my own samples.

Concerning the chip-seq workflow, I tried the modification you suggested the genome is now hg38 and the aligner:bwa-mem, this time it produced the bwa-index. So that's at least one thing we now, I'll ask for more memory next time I try with bwa-mem2. But the jobs still blocked at rule complement_blacklist, the error there is that :

Error: The genome file /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes has no valid entries (are you sure it's a 2-column bedtools genome file). Exiting.

The file hg38.fa.sizesis empty.

May be the problem comes from this:

[Tue Apr 13 18:48:33 2021]
localrule get_genome_support_files:
    input: /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa
    output: /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.fai, /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes, /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.gaps.bed
    jobid: 45
    wildcards: assembly=hg38

Warning: the following output files of rule get_genome_support_files were not present when the DAG was created:
{'/exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes'}

I tried to dig into the rules to find how the hg38.fa.sizes is created from hg38.fa but I can't find it from the python script.

Do you have any idea what the problem can be problem? For now I will try to use this hg.fa.sizes assuming that the file contains the chromosome sizes.

Here are attached:

the log file seq2science.2021-04-13T183204.821263.log
the terminal output slurm-2324390.out.log

Maarten-vd-Sande commented 3 years ago

Good news, I am happy at least some is working for you!

We just got a "freshly" installed server this morning, and I, unfortunately, can not reproduce this error there :disappointed: ...

The warning is indeed suspicious, however it also happenend on my successful run. I made an issue for this #682, but I don't think it's causing the problem.

One thing I noticed in the terminal output is the line:

Chromosome "chr1" undefined in /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes

as stdout/stderr that is not captured by our rule. However that also just seems to indicate that the .fa.sizes file is empty..

The ATAC-seq workflow is practically a copy of the chip-seq workflow, except that some defaults are set differently, so this is quite surprising to me. :thinking: @siebrenf I remember we had some file-latency ish error in the past with genomepy. The rule was registred as finished succesfully, but it was still running in the background somehow. Are we sure this was "solved"?

Perhaps @JihedC you could try adding a long sleep (e.g. 1 minute) at the end of this script? https://github.com/vanheeringen-lab/seq2science/blob/master/seq2science/scripts/genome_support.py. Maybe the cluster somehow needs some time to sync updates to files?

Maarten-vd-Sande commented 3 years ago

p.s. depending on whether or not you are used to conda/python packaging, adding the sleep might be extremely trivial, or quite complicated. Let me know if you don't know how to do it, I can type it out for you :smile:

JihedC commented 3 years ago

Hi Maarten,

I don't know how to do it, could you help me? I am using conda. I can't find where the scripts are saved in the environment.

I could find, I think genome.py in the /exports/humgen/jihed/miniconda3/envs/seq2science/bin but it doesn't look like the one you mentioned:

#!/bin/sh
'''exec' /exports/humgen/jihed/miniconda3/envs/seq2science/bin/python "$0" "$@"
' '''
# -*- coding: utf-8 -*-
import re
import sys
from genomepy.cli import cli
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(cli())

Maarten-vd-Sande commented 3 years ago

It should be in /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/scripts/genome_support.py

add

import time
time.sleep(60)

at the bottom

JihedC commented 3 years ago

Ok great thanks for the quick reply!

siebrenf commented 3 years ago

@siebrenf I remember we had some file-latency ish error in the past with genomepy. The rule was registred as finished succesfully, but it was still running in the background somehow. Are we sure this was "solved"?

it was a latency/communication issue with scripts in general, and it sounds like a plausible cause for this error!

JihedC commented 3 years ago

So I tried two things:

add the sleep time as discussed above
change the genome to mm10 and use my own samples.

And I have got the same issue with the .fa.sizes file, it's also empty with another genome:

Error: The genome file /exports/humgen/jihed/seq2science_rna_Seq/genomes/mm10/mm10.fa.sizes has no valid entries (are you sure it's a 2-column bedtools genome file). Exiting.

I have also got a similar error with the Zebra fish genome. You said it worked fine on your computer? May be there is something wrong with our cluster computer for the download of this file?

Maarten-vd-Sande commented 3 years ago

Yeah I honestly don't know what is going on, and it would be best if this can be fixed somehow...

One thing to try is to download the genome directly through genomepy, and see if you can use that .fa.sizes. Genomepy comes with seq2science, so you do not have to install anything

genomepy install [genome name] -g [location]

Let's hope you can just copy the freshly downloaded .fa.sizes from to the corrupt seq2science one, and you can at least just run the workflows from there...

Maarten-vd-Sande commented 3 years ago

Let me know if you get it working (or not)

JihedC commented 3 years ago

Yes I will update you as soon as I can. I had a little issue with memory space which slowed me a bit. Now it should be okay.

JihedC commented 3 years ago

Hi Maarten,

Here is what I did to try to make the chip-seq workflow run. My plan was to try to align ChIP-seq SE data from mouse to mm10 using bowtie2. Here are the samples.tsv and the config.yaml:

# for help with filling out the samples.tsv:
# https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv
# also make sure that you use tab as a delimiter
sample  assembly        descriptive_name
GSM1555120      mm10    Kap1_a

# tab-separated file of the samples
samples: samples.tsv

# pipeline file locations
result_dir: ./results  # where to store results
genome_dir: ./genomes  # where to look for or download the genomes
# fastq_dir: ./results/fastq  # where to look for or download the fastqs

# contact info for multiqc report and trackhub
email: j.chouaref@lumc.nl

# produce a UCSC trackhub?
create_trackhub: true

# how to handle replicates
biological_replicates: fisher  # change to "keep" to not combine them
technical_replicates: merge    # change to "keep" to not combine them

# which trimmer to use
trimmer: fastp

# which aligner to use
aligner: bowtie2

# filtering after alignment
remove_blacklist: true
min_mapping_quality: 30
only_primary_align: true

# peak caller
peak_caller:
  macs2:
      --keep-dup 1 --buffer-size 10000

## differential gene expression analysis
#contrasts:
#  - 'descriptive_name_all_HEL'

Since I got an empty file for the mm10.size.fa file, I downloaded mm10 with genomepy (a great discovery btw 😊 ). I think I got everything I need to run the pipeline with this:

drwx--S--- 2 jchouaref 5-A-SHARK_hg_bioinf          0 Apr 19 14:40 tmp883uf8d8
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2730872818 Apr 19 14:41 mm10.fa
drwxr-sr-x 3 jchouaref 5-A-SHARK_hg_bioinf         25 Apr 19 15:00 index
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf       3082 Apr 19 15:02 mm10.fa.fai
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf          0 Apr 19 15:02 mm10.gaps.bed
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf        435 Apr 19 15:02 README.txt
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf   18410093 Apr 19 15:02 mm10.annotation.gtf.gz
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf    5658076 Apr 19 15:02 mm10.annotation.bed.gz
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf       1405 Apr 19 15:20 mm10.fa.sizes

Note that mm10.gaps.bed is empty.

I ran the workflow on slurm and I have got the following problem

             ____  ____   __
            / ___)(  __) /  \
            \___ \ ) _) (  O )
            (____/(____) \__\)
                   ____
                  (___ \
                   / __/
                  (____)
   ____   ___  __  ____  __ _   ___  ____
  / ___) / __)(  )(  __)(  ( \ / __)(  __)
  \___ \( (__  )(  ) _) /    /( (__  ) _)
  (____/ \___)(__)(____)\_)__) \___)(____)

version: 0.5.1
docs: https://vanheeringen-lab.github.io/seq2science

Checking if seq2science was run already, if something in the configuration was changed, and if so, if seq2science needs to re-run any jobs.
Checking if samples are available online...
This can take some time.
Done!

CONFIGURATION VARIABLES:
samples                : /exports/humgen/jihed/seq2science/samples.tsv
bigwig_dir             : /exports/humgen/jihed/seq2science/results/bigwigs
counts_dir             : /exports/humgen/jihed/seq2science/results/counts
fastq_dir              : /exports/humgen/jihed/seq2science/results/fastq
final_bam_dir          : /exports/humgen/jihed/seq2science/results/final_bam
genome_dir             : /exports/humgen/jihed/seq2science/genomes
log_dir                : /exports/humgen/jihed/seq2science/results/log
qc_dir                 : /exports/humgen/jihed/seq2science/results/qc
result_dir             : /exports/humgen/jihed/seq2science/results
sra_dir                : /exports/humgen/jihed/seq2science/results/sra
trimmed_dir            : /exports/humgen/jihed/seq2science/results/fastq_trimmed
aligner                : bowtie2
cli_call               : ['/exports/humgen/jihed/miniconda3/envs/seq2science/bin/seq2science', 'run', 'chip-seq', '--cores', '20']
cores                  : 20
create_qc_report       : True
create_trackhub        : True
deeptools_flags        : --normalizeUsing BPM
deeptools_multibamsummary: --distanceBetweenBins 9000 --binSize 1000
deeptools_plotcorrelation: --colorMap RdYlBu_r --plotNumbers
deeptools_qc           : True
email                  : j.chouaref@lumc.nl
fqext                  : ['R1', 'R2']
fqsuffix               : fastq
logbase                : 2
markduplicates         : REMOVE_DUPLICATES=true -Xms4G -Xmx6G MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=999
min_mapping_quality    : 30
only_primary_align     : True
peak_caller            : {'macs2': '--keep-dup 1 --buffer-size 10000'}
peak_windowsize        : 100
remove_blacklist       : True
slop                   : 100
trimmer                : fastp
layout:                : {'GSM1555120': 'SINGLE'}

Building DAG of jobs...
Done. Now starting the real run.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Provided resources: parallel_downloads=3, deeptools_limit=16, R_scripts=1, mem_gb=94
Job counts:
        count   jobs
        1       bedgraph_bigwig
        1       bedtools_slop
        1       bowtie2_align
        1       bowtie2_index
        1       chipseeker
        1       combine_peaks
        1       combine_qc_files
        1       complement_blacklist
        1       computeMatrix
        1       coverage_table
        3       edgeR_normalization
        1       fastp_SE
        1       featureCounts
        1       get_genome_annotation
        1       get_genome_support_files
        4       log_normalization
        1       macs2_callpeak
        1       mark_duplicates
        4       mean_center
        1       mt_nuc_ratio_calculator
        1       multiqc
        1       multiqc_explain
        1       multiqc_header_info
        1       multiqc_rename_buttons
        1       multiqc_samplesconfig
        1       multiqc_schema
        1       onehot_peaks
        1       peak_bigpeak
        1       plotFingerprint
        1       plotProfile
        1       quantile_normalization
        1       run2sra
        1       runs2sample
        1       samtools_index
        1       samtools_presort
        2       samtools_stats
        1       seq2science
        1       setup_blacklist
        1       sieve_bam
        1       sra2fastq_SE
        1       trackhub
        1       unzip_annotation
        51

[Mon Apr 19 15:00:38 2021]
localrule multiqc_rename_buttons:
    output: /exports/humgen/jihed/seq2science/results/qc/sample_names_mm10.tsv
    jobid: 41
    wildcards: assembly=mm10

[Mon Apr 19 15:00:38 2021]
localrule get_genome_support_files:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa.fai, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa.sizes, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.gaps.bed
    jobid: 39
    wildcards: assembly=mm10

[Mon Apr 19 15:00:39 2021]
localrule multiqc_schema:
    output: /exports/humgen/jihed/seq2science/results/qc/schema.yaml
    jobid: 42

[Mon Apr 19 15:00:39 2021]
localrule multiqc_header_info:
    output: /exports/humgen/jihed/seq2science/results/qc/header_info.yaml
    jobid: 40

[Mon Apr 19 15:00:39 2021]
localrule multiqc_samplesconfig:
    output: /exports/humgen/jihed/seq2science/results/qc/samplesconfig_mqc.html
    jobid: 43

[Mon Apr 19 15:00:39 2021]
localrule setup_blacklist:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.customblacklist.bed
    jobid: 31
    wildcards: assembly=mm10

[Mon Apr 19 15:00:39 2021]
rule multiqc_explain:
    output: /exports/humgen/jihed/seq2science/results/log/workflow_explanation_mqc.html
    jobid: 45

[Mon Apr 19 15:00:39 2021]
rule get_genome_annotation:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.annotation.gtf.gz, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.annotation.bed.gz
    log: /exports/humgen/jihed/seq2science/results/log/get_annotation/mm10.genome.log
    jobid: 49
    benchmark: /exports/humgen/jihed/seq2science/results/benchmark/get_annotation/mm10.genome.benchmark.txt
    wildcards: raw_assembly=mm10
    priority: 1
    resources: parallel_downloads=1

[Mon Apr 19 15:00:39 2021]
rule run2sra:
    output: /exports/humgen/jihed/seq2science/results/sra/SRR2014796/SRR2014796/SRR2014796.sra
    log: /exports/humgen/jihed/seq2science/results/log/run2sra/SRR2014796.log
    jobid: 51
    benchmark: /exports/humgen/jihed/seq2science/results/benchmark/run2sra/SRR2014796.benchmark.txt
    wildcards: run=SRR2014796
    resources: parallel_downloads=1

[Mon Apr 19 15:00:39 2021]
rule bowtie2_index:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/index/bowtie2/
    log: /exports/humgen/jihed/seq2science/results/log/bowtie2_index/mm10.log
    jobid: 14
    benchmark: /exports/humgen/jihed/seq2science/results/benchmark/bowtie2_index/mm10.benchmark.txt
    wildcards: assembly=mm10
    priority: 1
    threads: 4

[Mon Apr 19 15:01:04 2021]
Finished job 40.
1 of 51 steps (2%) done
[Mon Apr 19 15:01:04 2021]
Finished job 41.
2 of 51 steps (4%) done
[Mon Apr 19 15:01:04 2021]
Finished job 42.
3 of 51 steps (6%) done
[Mon Apr 19 15:01:29 2021]
Finished job 45.
4 of 51 steps (8%) done
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/2969b8b6
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/323808ca
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/b8363b14
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/b8363b14
[Mon Apr 19 15:02:10 2021]
Finished job 43.
5 of 51 steps (10%) done
[Mon Apr 19 15:02:49 2021]
Finished job 39.
6 of 51 steps (12%) done
[Mon Apr 19 15:02:54 2021]
Finished job 49.
7 of 51 steps (14%) done
[Mon Apr 19 15:03:47 2021]
Finished job 51.
8 of 51 steps (16%) done

It was stuck at the job 8 for 2 days and then stopped due to the time limit I set for the slurm job. The problem was that the bowtie 2 index files were incomplete and for some reason it was not communicated to me:

             ____  ____   __
            / ___)(  __) /  \
            \___ \ ) _) (  O )
            (____/(____) \__\)
                   ____
                  (___ \
                   / __/
                  (____)
   ____   ___  __  ____  __ _   ___  ____
  / ___) / __)(  )(  __)(  ( \ / __)(  __)
  \___ \( (__  )(  ) _) /    /( (__  ) _)
  (____/ \___)(__)(____)\_)__) \___)(____)

version: 0.5.1
docs: https://vanheeringen-lab.github.io/seq2science

Checking if seq2science was run already, if something in the configuration was changed, and if so, if seq2science needs to re-run any jobs.
Checking if samples are available online...
This can take some time.
Done!

CONFIGURATION VARIABLES:
samples                : /exports/humgen/jihed/seq2science/samples.tsv
bigwig_dir             : /exports/humgen/jihed/seq2science/results/bigwigs
counts_dir             : /exports/humgen/jihed/seq2science/results/counts
fastq_dir              : /exports/humgen/jihed/seq2science/results/fastq
final_bam_dir          : /exports/humgen/jihed/seq2science/results/final_bam
genome_dir             : /exports/humgen/jihed/seq2science/genomes
log_dir                : /exports/humgen/jihed/seq2science/results/log
qc_dir                 : /exports/humgen/jihed/seq2science/results/qc
result_dir             : /exports/humgen/jihed/seq2science/results
sra_dir                : /exports/humgen/jihed/seq2science/results/sra
trimmed_dir            : /exports/humgen/jihed/seq2science/results/fastq_trimmed
aligner                : bowtie2
cli_call               : ['/exports/humgen/jihed/miniconda3/envs/seq2science/bin/seq2science', 'run', 'chip-seq', '--cores', '20']
cores                  : 20
create_qc_report       : True
create_trackhub        : True
deeptools_flags        : --normalizeUsing BPM
deeptools_multibamsummary: --distanceBetweenBins 9000 --binSize 1000
deeptools_plotcorrelation: --colorMap RdYlBu_r --plotNumbers
deeptools_qc           : True
email                  : j.chouaref@lumc.nl
fqext                  : ['R1', 'R2']
fqsuffix               : fastq
logbase                : 2
markduplicates         : REMOVE_DUPLICATES=true -Xms4G -Xmx6G MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=999
min_mapping_quality    : 30
only_primary_align     : True
peak_caller            : {'macs2': '--keep-dup 1 --buffer-size 10000'}
peak_windowsize        : 100
remove_blacklist       : True
slop                   : 100
trimmer                : fastp
layout:                : {'GSM1555120': 'SINGLE'}

Building DAG of jobs...
IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

    snakemake --cleanup-metadata <filenames>

To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:
/exports/humgen/jihed/seq2science/genomes/mm10/index/bowtie2/

I am going to try with bwa again.

JihedC commented 3 years ago

The issue is still the same: mm10.customblacklist.bed

Error in rule setup_blacklist:
    jobid: 0
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.customblacklist.bed

RuleException:
FileNotFoundError in line 38 of /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/bam_cleaning.smk:
[Errno 2] No such file or directory: '/exports/humgen/jihed/seq2science/genomes/mm10/mm10.blacklist.bed'
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/bam_cleaning.smk", line 38, in __rule_setup_blacklist
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Exiting because a job execution failed. Look above for error message

Do you have may be this file? or an example of its format?

Maarten-vd-Sande commented 3 years ago

I made a mm10 folder for you.

http://ocimum.science.ru.nl/mm10/

When running seq2science the first time with these files I think you need to use something like:

seq2science run chip-seq --skip-rerun --cores 24 --snakemakeOptions touch=True

This is necessary because the timestamps will be messed up from downloading the file, and otherwise snakemake/seq2science will try to re-create these files

JihedC commented 3 years ago

Thank you so much for these files! I have added them to my genomes/mm10 folder.

Unfortunately it still does not work. Here are the log and the slurmoutput:

seq2science.2021-04-21T100027.917233.log slurm-2426180.txt

The run goes so fast I am doubting that it's doing anything. Here is the content of the bwa-index:

jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/genomes/mm10/index/bwa-mem$ ls -ltr
total 5616152
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2730871864 Apr 20 12:41 mm10.bwt
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf  682717945 Apr 20 12:41 mm10.pac
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf       2857 Apr 20 12:41 mm10.ann
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf      11032 Apr 20 12:41 mm10.amb
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 1365435936 Apr 20 12:56 mm10.sa

So it created it but the results folder is almost empty:

jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/results$ ls -l
total 76
drwxr-sr-x  9 jchouaref 5-A-SHARK_hg_bioinf 203 Apr 20 13:03 benchmark
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf  59 Apr 20 13:01 fastq
drwxr-sr-x  2 jchouaref 5-A-SHARK_hg_bioinf   0 Apr 20 17:10 fastq_trimmed
drwxr-sr-x 28 jchouaref 5-A-SHARK_hg_bioinf 921 Apr 20 17:13 log
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf 168 Apr 20 13:03 qc
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf  28 Apr 19 15:00 sra

Do you think it because I am using a swatch command to distribute the job on the cluster? Then snakemake doesn't actually know which job are done or not?

Maarten-vd-Sande commented 3 years ago

I have honestly no clue what is going on here.. Sorry, I don't think I can help you :sob:

siebrenf commented 3 years ago

If the ATAC-seq run did work, but you get timeouts and unexplained errors, then maybe the issue is the server occupancy/load? This is a longshot, but you could try to run the workflow with few cores (so it runs only 1 job at a time), and run at a quiet moment. If Maarten's genome folder works it shouldn't re-index, so the RAM is spared. seq2science run chip-seq --skip-rerun --cores 5

JihedC commented 3 years ago

No worries @Maarten-vd-Sande I will try that @siebrenf

vanheeringen-lab / seq2science

BUG: Workflow not running #681

pipeline file locations

fastq_dir: ./results/fastq # where to look for or download the fastqs

contact info for multiqc report and trackhub

produce a UCSC trackhub?

how to handle replicates

which trimmer to use

which aligner to use

filtering after alignment

peak caller

differential gene expression analysis

contrasts:

- 'descriptive_name_all_HEL'

for help with filling out the samples.tsv:

https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv

also make sure that you use tab as a delimiter