Open JihedC opened 3 years ago
Is it possible there is a shell message being output? Those are not captured in stdout/stderr, and won't end up in the log but will be printed in your terminal. Segfaults and memory issues are examples of this.
Maybe setting aligner: bwa-mem
instead of bwa-mem2
helps in this case? bwa-mem2 is extremely memory hungry
The cluster was a bit busy today, I hope it will run during the night
Hi Maarten,
I have a good news, I tried the atac-seq workflow as well yesterday and it worked just fine, I'll try it today with my own samples.
Concerning the chip-seq workflow, I tried the modification you suggested the genome is now hg38
and the aligner:bwa-mem
, this time it produced the bwa-index. So that's at least one thing we now, I'll ask for more memory next time I try with bwa-mem2. But the jobs still blocked at rule complement_blacklist
, the error there is that :
Error: The genome file /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes has no valid entries (are you sure it's a 2-column bedtools genome file). Exiting.
The file hg38.fa.sizes
is empty.
May be the problem comes from this:
[Tue Apr 13 18:48:33 2021]
localrule get_genome_support_files:
input: /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa
output: /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.fai, /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes, /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.gaps.bed
jobid: 45
wildcards: assembly=hg38
Warning: the following output files of rule get_genome_support_files were not present when the DAG was created:
{'/exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes'}
I tried to dig into the rules to find how the hg38.fa.sizes is created from hg38.fa but I can't find it from the python script.
Do you have any idea what the problem can be problem? For now I will try to use this hg.fa.sizes assuming that the file contains the chromosome sizes.
Here are attached:
the log file seq2science.2021-04-13T183204.821263.log
the terminal output slurm-2324390.out.log
Good news, I am happy at least some is working for you!
We just got a "freshly" installed server this morning, and I, unfortunately, can not reproduce this error there :disappointed: ...
The warning is indeed suspicious, however it also happenend on my successful run. I made an issue for this #682, but I don't think it's causing the problem.
One thing I noticed in the terminal output is the line:
Chromosome "chr1" undefined in /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes
as stdout/stderr that is not captured by our rule. However that also just seems to indicate that the .fa.sizes file is empty..
The ATAC-seq workflow is practically a copy of the chip-seq workflow, except that some defaults are set differently, so this is quite surprising to me. :thinking: @siebrenf I remember we had some file-latency ish error in the past with genomepy. The rule was registred as finished succesfully, but it was still running in the background somehow. Are we sure this was "solved"?
Perhaps @JihedC you could try adding a long sleep (e.g. 1 minute) at the end of this script? https://github.com/vanheeringen-lab/seq2science/blob/master/seq2science/scripts/genome_support.py. Maybe the cluster somehow needs some time to sync updates to files?
p.s. depending on whether or not you are used to conda/python packaging, adding the sleep might be extremely trivial, or quite complicated. Let me know if you don't know how to do it, I can type it out for you :smile:
Hi Maarten,
I don't know how to do it, could you help me? I am using conda. I can't find where the scripts are saved in the environment.
I could find, I think genome.py
in the /exports/humgen/jihed/miniconda3/envs/seq2science/bin
but it doesn't look like the one you mentioned:
#!/bin/sh
'''exec' /exports/humgen/jihed/miniconda3/envs/seq2science/bin/python "$0" "$@"
' '''
# -*- coding: utf-8 -*-
import re
import sys
from genomepy.cli import cli
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(cli())
It should be in /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/scripts/genome_support.py
add
import time
time.sleep(60)
at the bottom
Ok great thanks for the quick reply!
@siebrenf I remember we had some file-latency ish error in the past with genomepy. The rule was registred as finished succesfully, but it was still running in the background somehow. Are we sure this was "solved"?
it was a latency/communication issue with scripts in general, and it sounds like a plausible cause for this error!
So I tried two things:
And I have got the same issue with the .fa.sizes
file, it's also empty with another genome:
Error: The genome file /exports/humgen/jihed/seq2science_rna_Seq/genomes/mm10/mm10.fa.sizes has no valid entries (are you sure it's a 2-column bedtools genome file). Exiting.
I have also got a similar error with the Zebra fish genome. You said it worked fine on your computer? May be there is something wrong with our cluster computer for the download of this file?
Yeah I honestly don't know what is going on, and it would be best if this can be fixed somehow...
One thing to try is to download the genome directly through genomepy, and see if you can use that .fa.sizes. Genomepy comes with seq2science, so you do not have to install anything
genomepy install [genome name] -g [location]
Let's hope you can just copy the freshly downloaded .fa.sizes from to the corrupt seq2science one, and you can at least just run the workflows from there...
Let me know if you get it working (or not)
Yes I will update you as soon as I can. I had a little issue with memory space which slowed me a bit. Now it should be okay.
Hi Maarten,
Here is what I did to try to make the chip-seq workflow run.
My plan was to try to align ChIP-seq SE data from mouse to mm10 using bowtie2. Here are the samples.tsv
and the config.yaml
:
# for help with filling out the samples.tsv:
# https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv
# also make sure that you use tab as a delimiter
sample assembly descriptive_name
GSM1555120 mm10 Kap1_a
# tab-separated file of the samples
samples: samples.tsv
# pipeline file locations
result_dir: ./results # where to store results
genome_dir: ./genomes # where to look for or download the genomes
# fastq_dir: ./results/fastq # where to look for or download the fastqs
# contact info for multiqc report and trackhub
email: j.chouaref@lumc.nl
# produce a UCSC trackhub?
create_trackhub: true
# how to handle replicates
biological_replicates: fisher # change to "keep" to not combine them
technical_replicates: merge # change to "keep" to not combine them
# which trimmer to use
trimmer: fastp
# which aligner to use
aligner: bowtie2
# filtering after alignment
remove_blacklist: true
min_mapping_quality: 30
only_primary_align: true
# peak caller
peak_caller:
macs2:
--keep-dup 1 --buffer-size 10000
## differential gene expression analysis
#contrasts:
# - 'descriptive_name_all_HEL'
Since I got an empty file for the mm10.size.fa file, I downloaded mm10 with genomepy
(a great discovery btw 😊 ). I think I got everything I need to run the pipeline with this:
drwx--S--- 2 jchouaref 5-A-SHARK_hg_bioinf 0 Apr 19 14:40 tmp883uf8d8
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2730872818 Apr 19 14:41 mm10.fa
drwxr-sr-x 3 jchouaref 5-A-SHARK_hg_bioinf 25 Apr 19 15:00 index
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 3082 Apr 19 15:02 mm10.fa.fai
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 0 Apr 19 15:02 mm10.gaps.bed
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 435 Apr 19 15:02 README.txt
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 18410093 Apr 19 15:02 mm10.annotation.gtf.gz
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 5658076 Apr 19 15:02 mm10.annotation.bed.gz
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 1405 Apr 19 15:20 mm10.fa.sizes
Note that mm10.gaps.bed is empty.
I ran the workflow on slurm and I have got the following problem
____ ____ __
/ ___)( __) / \
\___ \ ) _) ( O )
(____/(____) \__\)
____
(___ \
/ __/
(____)
____ ___ __ ____ __ _ ___ ____
/ ___) / __)( )( __)( ( \ / __)( __)
\___ \( (__ )( ) _) / /( (__ ) _)
(____/ \___)(__)(____)\_)__) \___)(____)
version: 0.5.1
docs: https://vanheeringen-lab.github.io/seq2science
Checking if seq2science was run already, if something in the configuration was changed, and if so, if seq2science needs to re-run any jobs.
Checking if samples are available online...
This can take some time.
Done!
CONFIGURATION VARIABLES:
samples : /exports/humgen/jihed/seq2science/samples.tsv
bigwig_dir : /exports/humgen/jihed/seq2science/results/bigwigs
counts_dir : /exports/humgen/jihed/seq2science/results/counts
fastq_dir : /exports/humgen/jihed/seq2science/results/fastq
final_bam_dir : /exports/humgen/jihed/seq2science/results/final_bam
genome_dir : /exports/humgen/jihed/seq2science/genomes
log_dir : /exports/humgen/jihed/seq2science/results/log
qc_dir : /exports/humgen/jihed/seq2science/results/qc
result_dir : /exports/humgen/jihed/seq2science/results
sra_dir : /exports/humgen/jihed/seq2science/results/sra
trimmed_dir : /exports/humgen/jihed/seq2science/results/fastq_trimmed
aligner : bowtie2
cli_call : ['/exports/humgen/jihed/miniconda3/envs/seq2science/bin/seq2science', 'run', 'chip-seq', '--cores', '20']
cores : 20
create_qc_report : True
create_trackhub : True
deeptools_flags : --normalizeUsing BPM
deeptools_multibamsummary: --distanceBetweenBins 9000 --binSize 1000
deeptools_plotcorrelation: --colorMap RdYlBu_r --plotNumbers
deeptools_qc : True
email : j.chouaref@lumc.nl
fqext : ['R1', 'R2']
fqsuffix : fastq
logbase : 2
markduplicates : REMOVE_DUPLICATES=true -Xms4G -Xmx6G MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=999
min_mapping_quality : 30
only_primary_align : True
peak_caller : {'macs2': '--keep-dup 1 --buffer-size 10000'}
peak_windowsize : 100
remove_blacklist : True
slop : 100
trimmer : fastp
layout: : {'GSM1555120': 'SINGLE'}
Building DAG of jobs...
Done. Now starting the real run.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Provided resources: parallel_downloads=3, deeptools_limit=16, R_scripts=1, mem_gb=94
Job counts:
count jobs
1 bedgraph_bigwig
1 bedtools_slop
1 bowtie2_align
1 bowtie2_index
1 chipseeker
1 combine_peaks
1 combine_qc_files
1 complement_blacklist
1 computeMatrix
1 coverage_table
3 edgeR_normalization
1 fastp_SE
1 featureCounts
1 get_genome_annotation
1 get_genome_support_files
4 log_normalization
1 macs2_callpeak
1 mark_duplicates
4 mean_center
1 mt_nuc_ratio_calculator
1 multiqc
1 multiqc_explain
1 multiqc_header_info
1 multiqc_rename_buttons
1 multiqc_samplesconfig
1 multiqc_schema
1 onehot_peaks
1 peak_bigpeak
1 plotFingerprint
1 plotProfile
1 quantile_normalization
1 run2sra
1 runs2sample
1 samtools_index
1 samtools_presort
2 samtools_stats
1 seq2science
1 setup_blacklist
1 sieve_bam
1 sra2fastq_SE
1 trackhub
1 unzip_annotation
51
[Mon Apr 19 15:00:38 2021]
localrule multiqc_rename_buttons:
output: /exports/humgen/jihed/seq2science/results/qc/sample_names_mm10.tsv
jobid: 41
wildcards: assembly=mm10
[Mon Apr 19 15:00:38 2021]
localrule get_genome_support_files:
input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa.fai, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa.sizes, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.gaps.bed
jobid: 39
wildcards: assembly=mm10
[Mon Apr 19 15:00:39 2021]
localrule multiqc_schema:
output: /exports/humgen/jihed/seq2science/results/qc/schema.yaml
jobid: 42
[Mon Apr 19 15:00:39 2021]
localrule multiqc_header_info:
output: /exports/humgen/jihed/seq2science/results/qc/header_info.yaml
jobid: 40
[Mon Apr 19 15:00:39 2021]
localrule multiqc_samplesconfig:
output: /exports/humgen/jihed/seq2science/results/qc/samplesconfig_mqc.html
jobid: 43
[Mon Apr 19 15:00:39 2021]
localrule setup_blacklist:
input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.customblacklist.bed
jobid: 31
wildcards: assembly=mm10
[Mon Apr 19 15:00:39 2021]
rule multiqc_explain:
output: /exports/humgen/jihed/seq2science/results/log/workflow_explanation_mqc.html
jobid: 45
[Mon Apr 19 15:00:39 2021]
rule get_genome_annotation:
input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.annotation.gtf.gz, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.annotation.bed.gz
log: /exports/humgen/jihed/seq2science/results/log/get_annotation/mm10.genome.log
jobid: 49
benchmark: /exports/humgen/jihed/seq2science/results/benchmark/get_annotation/mm10.genome.benchmark.txt
wildcards: raw_assembly=mm10
priority: 1
resources: parallel_downloads=1
[Mon Apr 19 15:00:39 2021]
rule run2sra:
output: /exports/humgen/jihed/seq2science/results/sra/SRR2014796/SRR2014796/SRR2014796.sra
log: /exports/humgen/jihed/seq2science/results/log/run2sra/SRR2014796.log
jobid: 51
benchmark: /exports/humgen/jihed/seq2science/results/benchmark/run2sra/SRR2014796.benchmark.txt
wildcards: run=SRR2014796
resources: parallel_downloads=1
[Mon Apr 19 15:00:39 2021]
rule bowtie2_index:
input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
output: /exports/humgen/jihed/seq2science/genomes/mm10/index/bowtie2/
log: /exports/humgen/jihed/seq2science/results/log/bowtie2_index/mm10.log
jobid: 14
benchmark: /exports/humgen/jihed/seq2science/results/benchmark/bowtie2_index/mm10.benchmark.txt
wildcards: assembly=mm10
priority: 1
threads: 4
[Mon Apr 19 15:01:04 2021]
Finished job 40.
1 of 51 steps (2%) done
[Mon Apr 19 15:01:04 2021]
Finished job 41.
2 of 51 steps (4%) done
[Mon Apr 19 15:01:04 2021]
Finished job 42.
3 of 51 steps (6%) done
[Mon Apr 19 15:01:29 2021]
Finished job 45.
4 of 51 steps (8%) done
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/2969b8b6
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/323808ca
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/b8363b14
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/b8363b14
[Mon Apr 19 15:02:10 2021]
Finished job 43.
5 of 51 steps (10%) done
[Mon Apr 19 15:02:49 2021]
Finished job 39.
6 of 51 steps (12%) done
[Mon Apr 19 15:02:54 2021]
Finished job 49.
7 of 51 steps (14%) done
[Mon Apr 19 15:03:47 2021]
Finished job 51.
8 of 51 steps (16%) done
It was stuck at the job 8 for 2 days and then stopped due to the time limit I set for the slurm job. The problem was that the bowtie 2 index files were incomplete and for some reason it was not communicated to me:
____ ____ __
/ ___)( __) / \
\___ \ ) _) ( O )
(____/(____) \__\)
____
(___ \
/ __/
(____)
____ ___ __ ____ __ _ ___ ____
/ ___) / __)( )( __)( ( \ / __)( __)
\___ \( (__ )( ) _) / /( (__ ) _)
(____/ \___)(__)(____)\_)__) \___)(____)
version: 0.5.1
docs: https://vanheeringen-lab.github.io/seq2science
Checking if seq2science was run already, if something in the configuration was changed, and if so, if seq2science needs to re-run any jobs.
Checking if samples are available online...
This can take some time.
Done!
CONFIGURATION VARIABLES:
samples : /exports/humgen/jihed/seq2science/samples.tsv
bigwig_dir : /exports/humgen/jihed/seq2science/results/bigwigs
counts_dir : /exports/humgen/jihed/seq2science/results/counts
fastq_dir : /exports/humgen/jihed/seq2science/results/fastq
final_bam_dir : /exports/humgen/jihed/seq2science/results/final_bam
genome_dir : /exports/humgen/jihed/seq2science/genomes
log_dir : /exports/humgen/jihed/seq2science/results/log
qc_dir : /exports/humgen/jihed/seq2science/results/qc
result_dir : /exports/humgen/jihed/seq2science/results
sra_dir : /exports/humgen/jihed/seq2science/results/sra
trimmed_dir : /exports/humgen/jihed/seq2science/results/fastq_trimmed
aligner : bowtie2
cli_call : ['/exports/humgen/jihed/miniconda3/envs/seq2science/bin/seq2science', 'run', 'chip-seq', '--cores', '20']
cores : 20
create_qc_report : True
create_trackhub : True
deeptools_flags : --normalizeUsing BPM
deeptools_multibamsummary: --distanceBetweenBins 9000 --binSize 1000
deeptools_plotcorrelation: --colorMap RdYlBu_r --plotNumbers
deeptools_qc : True
email : j.chouaref@lumc.nl
fqext : ['R1', 'R2']
fqsuffix : fastq
logbase : 2
markduplicates : REMOVE_DUPLICATES=true -Xms4G -Xmx6G MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=999
min_mapping_quality : 30
only_primary_align : True
peak_caller : {'macs2': '--keep-dup 1 --buffer-size 10000'}
peak_windowsize : 100
remove_blacklist : True
slop : 100
trimmer : fastp
layout: : {'GSM1555120': 'SINGLE'}
Building DAG of jobs...
IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with
snakemake --cleanup-metadata <filenames>
To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:
/exports/humgen/jihed/seq2science/genomes/mm10/index/bowtie2/
I am going to try with bwa again.
The issue is still the same: mm10.customblacklist.bed
Error in rule setup_blacklist:
jobid: 0
output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.customblacklist.bed
RuleException:
FileNotFoundError in line 38 of /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/bam_cleaning.smk:
[Errno 2] No such file or directory: '/exports/humgen/jihed/seq2science/genomes/mm10/mm10.blacklist.bed'
File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/bam_cleaning.smk", line 38, in __rule_setup_blacklist
File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/concurrent/futures/thread.py", line 57, in run
File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Exiting because a job execution failed. Look above for error message
Do you have may be this file? or an example of its format?
I made a mm10 folder for you.
http://ocimum.science.ru.nl/mm10/
When running seq2science the first time with these files I think you need to use something like:
seq2science run chip-seq --skip-rerun --cores 24 --snakemakeOptions touch=True
This is necessary because the timestamps will be messed up from downloading the file, and otherwise snakemake/seq2science will try to re-create these files
Thank you so much for these files! I have added them to my genomes/mm10 folder
.
Unfortunately it still does not work. Here are the log and the slurmoutput:
seq2science.2021-04-21T100027.917233.log slurm-2426180.txt
The run goes so fast I am doubting that it's doing anything. Here is the content of the bwa-index:
jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/genomes/mm10/index/bwa-mem$ ls -ltr
total 5616152
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2730871864 Apr 20 12:41 mm10.bwt
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 682717945 Apr 20 12:41 mm10.pac
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2857 Apr 20 12:41 mm10.ann
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 11032 Apr 20 12:41 mm10.amb
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 1365435936 Apr 20 12:56 mm10.sa
So it created it but the results folder is almost empty:
jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/results$ ls -l
total 76
drwxr-sr-x 9 jchouaref 5-A-SHARK_hg_bioinf 203 Apr 20 13:03 benchmark
drwxr-sr-x 3 jchouaref 5-A-SHARK_hg_bioinf 59 Apr 20 13:01 fastq
drwxr-sr-x 2 jchouaref 5-A-SHARK_hg_bioinf 0 Apr 20 17:10 fastq_trimmed
drwxr-sr-x 28 jchouaref 5-A-SHARK_hg_bioinf 921 Apr 20 17:13 log
drwxr-sr-x 3 jchouaref 5-A-SHARK_hg_bioinf 168 Apr 20 13:03 qc
drwxr-sr-x 3 jchouaref 5-A-SHARK_hg_bioinf 28 Apr 19 15:00 sra
Do you think it because I am using a swatch command to distribute the job on the cluster? Then snakemake doesn't actually know which job are done or not?
I have honestly no clue what is going on here.. Sorry, I don't think I can help you :sob:
If the ATAC-seq run did work, but you get timeouts and unexplained errors, then maybe the issue is the server occupancy/load?
This is a longshot, but you could try to run the workflow with few cores (so it runs only 1 job at a time), and run at a quiet moment.
If Maarten's genome folder works it shouldn't re-index, so the RAM is spared.
seq2science run chip-seq --skip-rerun --cores 5
No worries @Maarten-vd-Sande I will try that @siebrenf
Hi,
I have been trying to run the chip-seq workflow of seq2science. It starts but stops when 7% of the jobs are done.
To Reproduce Please include your config.yaml, your samples.tsv, and the complete/relevant output.
Both
config.yaml
andsamples.tsv
were generated fromseq2science init chip-seq
pipeline file locations
result_dir: ./results # where to store results genome_dir: ./genomes # where to look for or download the genomes
fastq_dir: ./results/fastq # where to look for or download the fastqs
contact info for multiqc report and trackhub
email: yourmail@here.com
produce a UCSC trackhub?
create_trackhub: true
how to handle replicates
biological_replicates: fisher # change to "keep" to not combine them technical_replicates: merge # change to "keep" to not combine them
which trimmer to use
trimmer: fastp
which aligner to use
aligner: bwa-mem2
filtering after alignment
remove_blacklist: true min_mapping_quality: 30 only_primary_align: true
peak caller
peak_caller: macs2: --keep-dup 1 --buffer-size 10000
differential gene expression analysis
contrasts:
- 'descriptive_name_all_HEL'
for help with filling out the samples.tsv:
https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv
also make sure that you use tab as a delimiter
sample assembly descriptive_name GSM4404624 hg38 HEL
Looking to launch executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx", simd = .avx Launching executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx" [bwa_index] Pack FASTA... 18.78 sec
Those are the files I got in the genome folder:
Do you think the problem comes from there?