Closed eesiribloom closed 8 months ago
Dear @eesiribloom,
Currently Decoil was not tested with singularity. This is something I am working on. Do you have any error? Are you sure you use absolute paths for $ANNO and $GENOME?
Best, Madalina
Actually yes, when I look at the snakemake log file for a run with my data I get this: Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count
all 1 coverage 1 decoil 1 survivor 1 svcalling 1 total 5
Select jobs to execute...
[Tue Feb 20 13:28:38 2024] rule coverage: input: /data/input.bam, /data/input.bam.bai output: /output/coverage.bw log: /output/logs/logs_cov jobid: 3 reason: Forced execution wildcards: dirname=/output resources: tmpdir=/tmp
bamCoverage --bam /data/input.bam -o /output/coverage.bw --binSize 50 -p 4 &> /output/logs/logs_cov
[Tue Feb 20 17:45:45 2024] Finished job 3. 1 of 5 steps (20%) done Select jobs to execute...
[Tue Feb 20 17:45:45 2024] rule svcalling: input: /data/input.bam output: /output/sv.sniffles.vcf log: /output/logs/logs_sniffles jobid: 1 reason: Forced execution wildcards: dirname=/output resources: tmpdir=/tmp
sniffles -t 4 -m /data/input.bam -v /output/sv.sniffles.vcf --min_homo_af 0.7 --min_het_af 0.1 --min_length 50 --cluster --genotype --min_support 4 --report-seq &> /output/logs/logs_sniffles
[Wed Feb 21 05:59:51 2024] Finished job 1. 2 of 5 steps (40%) done Select jobs to execute...
[Wed Feb 21 05:59:51 2024] rule decoil: input: /output/sv.sniffles.vcf, /output/coverage.bw, /data/input.bam.bai output: /output/summary.txt, /output/reconstruct.ecDNA.filtered.bed, /output/reconstruct.ecDNA.filtered.fasta log: /output/logs/logs_decoil jobid: 4 reason: Missing output files: /output/reconstruct.ecDNA.filtered.bed, /output/reconstruct.ecDNA.filtered.fasta, /output/summary.txt; Input files updated by another job: /output/sv.sniffles.vcf, /output/coverage.bw wildcards: dirname=/output resources: tmpdir=/tmp
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-02-20T132836.395335.snakemake.log
I dont understand what exactly the error is and why it only occurs with my sample data and not the test data. Could it be to do with using singularity? I did use absolute paths for all the variables. The incomplete set of files I get output from my own data are: clean.vcf config.json coverage.bw fragments_clean_1.bed fragments_clean_2.bed fragments_clean_3.bed fragments_initial.bed graph.gpickle logs metrics_frag_cov_distribution.png metrics_frag_len_cov_correlation.png metrics_frag_len_distribution.png reconstruct.bed reconstruct.bed_debug reconstruct.ecDNA.bed reconstruct.json reconstruct.links.txt_debug sv.sniffles.bedpe sv.sniffles.vcf
Dear @eesiribloom,
I have been looking into the issue with singularity
. The mount point /examples
is not writable in singularity
, which is different to docker
, where this folder has write
permissions. Forsingularity
only specialised folders have write
permission. One of them being /mnt
. Try:
# singularity needs to create upfront the output directory which will be mounted into the container
mkdir -p $PWD/test3
# run decoil-pipeline using singularity in `sv-reconstruct` mode
singularity run \
--bind $PWD/test3:/mnt \
--bind $PWD/$GTFANNO:/annotation/anno.gtf \
--bind $PWD/$REFGENOME:/annotation/reference.fa \
decoil.sif decoil -f sv-reconstruct \
--bam /examples/ecdna1/map.bam \
--reference-genome /annotation/reference.fa \
--annotation-gtf /annotation/anno.gtf \
--outputdir /mnt \
--name ecdna1
Apologies, I have tried again with singularity using this command on my own data:
export BAM_INPUT="/absolute/path/to/${SAMPLE}_mod_aligned.bam"
export OUTPUT_FOLDER="/absolute/path/to/SMG-${SAMPLE}T"
export NAME="SMG-${SAMPLE}T"
export GENOME="/absolute/path/to/reference.fa"
export ANNO="/absolute/path/to/anno.gtf"
mkdir -p /absolute/path/to/SMG-${SAMPLE}T
mkdir -p /absolute/path/to/SMG-${SAMPLE}T/logs
mkdir -p /absolute/path/to/SMG-${SAMPLE}T/tmp
singularity run \
-B $PWD:$HOME \
-B ${BAM_INPUT}:/data/input.bam \
-B ${BAM_INPUT}.bai:/data/input.bam.bai \
-B ${GENOME}:/annotation/reference.fa \
-B ${ANNO}:/annotation/anno.gtf \
-B ${OUTPUT_FOLDER}:/mnt \
-B ${OUTPUT_FOLDER}/logs:/mnt/logs \
-B ${OUTPUT_FOLDER}/tmp:/tmp \
decoil.sif decoil sv-reconstruct \
-b /data/input.bam \
-r /annotation/reference.fa \
-g /annotation/anno.gtf \
-o /mnt \
--name ${NAME}
the files produced are the same as above and my log file is: INFO: Convert SIF file to sandbox... During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" 5: Setting LC_PAPER failed, using "C" 6: Setting LC_MEASUREMENT failed, using "C" Found workflowfile at /opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/cli/sv-reconstruct.json Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count
all 1 coverage 1 decoil 1 survivor 1 svcalling 1 total 5
Select jobs to execute...
[Mon Mar 25 14:33:22 2024] rule svcalling: input: /data/input.bam output: /mnt/sv.sniffles.vcf log: /mnt/logs/logs_sniffles jobid: 1 reason: Missing output files: /mnt/sv.sniffles.vcf wildcards: dirname=/mnt resources: tmpdir=/tmp
sniffles -t 4 -m /data/input.bam -v /mnt/sv.sniffles.vcf --min_homo_af 0.7 --min_het_af 0.1 --min_length 50 --cluster --genotype --min_support 4 --report-seq &> /mnt/logs/logs_sniffles
[Mon Mar 25 16:45:33 2024] Finished job 1. 1 of 5 steps (20%) done Select jobs to execute...
[Mon Mar 25 16:45:33 2024] rule survivor: input: /mnt/sv.sniffles.vcf output: /mnt/sv.sniffles.bedpe log: /mnt/logs/logs_survivor jobid: 2 reason: Missing output files: /mnt/sv.sniffles.bedpe; Input files updated by another job: /mnt/sv.sniffles.vcf wildcards: dirname=/mnt resources: tmpdir=/tmp
SURVIVOR vcftobed /mnt/sv.sniffles.vcf -1 -1 /mnt/sv.sniffles.bedpe &> /mnt/logs/logs_survivor
[Mon Mar 25 16:45:34 2024] Finished job 2. 2 of 5 steps (40%) done Select jobs to execute...
[Mon Mar 25 16:45:34 2024] rule coverage: input: /data/input.bam, /data/input.bam.bai output: /mnt/coverage.bw log: /mnt/logs/logs_cov jobid: 3 reason: Missing output files: /mnt/coverage.bw wildcards: dirname=/mnt resources: tmpdir=/tmp
bamCoverage --bam /data/input.bam -o /mnt/coverage.bw --binSize 50 -p 4 &> /mnt/logs/logs_cov
[Mon Mar 25 17:39:23 2024] Finished job 3. 3 of 5 steps (60%) done Select jobs to execute...
[Mon Mar 25 17:39:23 2024] rule decoil: input: /mnt/sv.sniffles.vcf, /mnt/coverage.bw, /data/input.bam.bai output: /mnt/summary.txt, /mnt/reconstruct.ecDNA.filtered.bed, /mnt/reconstruct.ecDNA.filtered.fasta log: /mnt/logs/logs_decoil jobid: 4 reason: Missing output files: /mnt/reconstruct.ecDNA.filtered.fasta, /mnt/summary.txt, /mnt/reconstruct.ecDNA.filtered.bed; Input files updated by another job: /mnt/sv.sniffles.vcf, /mnt/coverage.bw wildcards: dirname=/mnt resources: tmpdir=/tmp
During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" 5: Setting LC_PAPER failed, using "C" 6: Setting LC_MEASUREMENT failed, using "C" Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Select jobs to execute... [Mon Mar 25 17:49:31 2024] Error in rule decoil: jobid: 0 input: /mnt/sv.sniffles.vcf, /mnt/coverage.bw, /data/input.bam.bai output: /mnt/summary.txt, /mnt/reconstruct.ecDNA.filtered.bed, /mnt/reconstruct.ecDNA.filtered.fasta log: /mnt/logs/logs_decoil (check log file(s) for error details)
RuleException: KeyError in file /opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/cli/Snakefile, line 63: '10' File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/cli/Snakefile", line 63, in __rule_decoil File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/main.py", line 350, in run_reconstruction File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/main.py", line 204, in run_save_v2_new File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/output/parse.py", line 220, in convert_bed2fasta File "/opt/conda/envs/envdecoil/lib/python3.8/concurrent/futures/thread.py", line 57, in run Removing output files of failed job decoil since they might be corrupted: /mnt/reconstruct.ecDNA.filtered.bed Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-03-25T143320.373124.snakemake.log Traceback (most recent call last): File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/command.py", line 203, in main raise Exception("Snakemake failed") Exception: Snakemake failed INFO: Cleaning up image...
Can you please check if the $GENOME matches the reference genome which was used for the alignment? In one you might use in the .bam file "chr10", but in GENOME "10" as annotation for chromosome 10 (or the other way around).
Thanks for your advice, it has worked completely now that I made sure to use the same reference genome.
@eesiribloom, I will close the issue but if you have other questions, please open another issue.
I've managed to get decoil to run with my data and produce all the right files but I am missing the summary.txt file This file was produced with the test data but not with my own data. (I used singularity to work with the docker image as a sif file)
The command I used was:
export BAM_INPUT="/path/to/${SAMPLE}_mod_aligned.bam" export OUTPUT_FOLDER="/path/to/${SAMPLE}T" export NAME="${SAMPLE}T" export GENOME="path/to/reference.fa" export ANNO="path/to/anno.gtf"
singularity run \ -B $PWD:$HOME \ -B ${BAM_INPUT}:/data/input.bam \ -B ${BAM_INPUT}.bai:/data/input.bam.bai \ -B ${GENOME}:/annotation/reference.fa \ -B ${ANNO}:/annotation/anno.gtf \ -B ${OUTPUT_FOLDER}:/output \ decoil.sif decoil -f sv-reconstruct \ -b /data/input.bam \ -r /annotation/reference.fa \ -g /annotation/anno.gtf \ -o /output -n ${NAME}