maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
MIT License
374 stars 85 forks source link

createIndices error on cluster (sorry for spaming!) #932

Closed sunta3iouxos closed 9 months ago

sunta3iouxos commented 11 months ago

The command I used is:


nohup createIndices -o assemblies/mm10_gencodeM19_spikesCUTnRUN --tools bowtie2 --genomeURL assemblies/GRCm38_gencode_release19/genome_fasta/genome.fa --gtfURL assemblies/GRCm38_gencode_release19/annotation/genes.gtf --spikeinGenomeURL assemblies/EB1/Sequence/WholeGenomeFasta/genome.fa --spikeinGtfURL assemblies/EB1/Annotation/Archives/archive-2015-07-17-14-31-12/Genes/genes.gtf  --blacklist assemblies/blacklist/mm10_CandR_blacklist.bed --ignoreForNormalization assemblies/blacklist/ignoreForNormalization.csv mm10_gencodeM19_spikesCUTnRUN > createIndices.txt &

I have already run successfully a DNA-mapping and ChIP-seq pipeline and wanted to create new indexed reference genome to include spike ins and "proper" exclusion chromosomes for normalisation type of analysis (also relevant #930 )

the command fails with the following error:


---- This analysis has been done using snakePipes version 2.7.3 ----
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cluster nodes: 8
Job stats:
job                           count    min threads    max threads
--------------------------  -------  -------------  -------------
all                               1              1              1
bowtie2Index                      1             10             10
computeEffectiveGenomeSize        1              1              1
createGenomeFasta                 1              1              1
createHostGenomeFasta             1              1              1
extendGenicRegions                1              1              1
fastaDict                         1              1              1
fastaIndex                        1              1              1
make2bit                          1              1              1
total                             9              1             10

Select jobs to execute...

[Thu Aug 24 16:12:07 2023]
rule createHostGenomeFasta:
    output: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa
    jobid: 2
    reason: Missing output files: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa
    resources: mem_mb=1000, disk_mb=1000, tmpdir=<TBD>

Submitted job 2 with external jobid 'Submitted batch job 18124070'.
[Thu Aug 24 16:13:37 2023]
Error in rule createHostGenomeFasta:
    jobid: 2
    output: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa
    cluster_jobid: Submitted batch job 18124070

Error executing rule createHostGenomeFasta on cluster (jobid: 2, external: Submitted batch job 18124070, jobscript: assemblies/mm10_gencodeM19_spikesCUTnRUN/.snakemake/tmp.s3jj8h0u/snakejob.createHostGenomeFasta.2.sh). For error details see the cluster log and the log files of the involved rule(s).
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-24T161135.342113.snakemake.log

 !!! ERROR in index creation workflow! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Error: snakemake returned an error code of 1, so processing is incomplete!

this is the error in 19_spikesCUTnRUN/cluster_logs/createHostGenomeFasta.18124033.err


---- This analysis has been done using snakePipes version 2.7.3 ----
Building DAG of jobs...
Falling back to greedy scheduler because no default solver is found for pulp (you have to install either coincbc or glpk).
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=1000, disk_mb=1000
Select jobs to execute...

[Thu Aug 24 15:02:22 2023]
rule createHostGenomeFasta:
    output: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa
    jobid: 0
    reason: Missing output files: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/scratch/temp

[Thu Aug 24 15:02:26 2023]
Error in rule createHostGenomeFasta:
    jobid: 0
    output: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa

RuleException:
OSError in line 11 of mamba/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/createIndices.snakefile:
[Errno 14] Bad address
  File "mamba/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/createIndices.snakefile", line 42, in __rule_createHostGenomeFasta
  File "mamba/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/createIndices.snakefile", line 11, in downloadFile
  File "mamba/snakePipes/lib/python3.11/tempfile.py", line 483, in func_wrapper
  File "mamba/snakePipes/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

and this is the error in 19_spikesCUTnRUN/cluster_logs/createHostGenomeFasta.18124070.err


---- This analysis has been done using snakePipes version 2.7.3 ----
Building DAG of jobs...
Falling back to greedy scheduler because no default solver is found for pulp (you have to install either coincbc or glpk).
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=1000, disk_mb=1000
Select jobs to execute...

[Thu Aug 24 16:13:22 2023]
rule createHostGenomeFasta:
    output: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa
    jobid: 0
    reason: Missing output files: assemblies/mm10_gencodeM19_spikesCUTnRUN/genome_fasta/host.genome.fa
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/scratch/temp

/var/log/slurm/spool_slurmd//job18124070/slurm_script: line 3: 16257 Bus error               (core dumped) mamba/snakePipes/bin/python3.11 -m snakemake --snakefile 'mamba/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/createIndices/Snakefile' --target-jobs 'createHostGenomeFasta:' --allowed-rules 'createHostGenomeFasta' --cores 'all' --attempt 1 --force-use-threads --resources 'mem_mb=1000' 'disk_mb=1000' --wait-for-files 'assemblies/mm10_gencodeM19_spikesCUTnRUN/.snakemake/tmp.s3jj8h0u' --force --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers 'software-env' 'mtime' 'params' 'input' 'code' --skip-script-cleanup --use-conda --conda-frontend 'mamba' --conda-prefix 'mamba/snakePipes/envs' --conda-base-path 'mambaforge' --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --configfiles 'assemblies/mm10_gencodeM19_spikesCUTnRUN/createIndices.config.yaml' --latency-wait 300 --scheduler 'ilp' --scheduler-solver-path 'mamba/snakePipes/bin' --default-resources 'mem_mb=max(2*input.size_mb, 1000)' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' --directory 'assemblies/mm10_gencodeM19_spikesCUTnRUN' --mode 2
katsikora commented 10 months ago

Hi,

it looks like the command that is reading the file you provided the url (or path) for host genome fasta is erroring out. Can you try by providing the absolute path to that fasta file, rather than a relative path?

Best wishes,

Katarzyna

sunta3iouxos commented 10 months ago

sorry for the delayed answer, I was on vacation. Will try your suggestion as soon as possible

katsikora commented 9 months ago

Closing for now, feel free to reopen if needed.