madagiurgiu25 / decoil-pre

Reconstruct ecDNA from long-read data using Decoil tool
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

No summary.txt file produced #8

Closed eesiribloom closed 8 months ago

eesiribloom commented 8 months ago

I've managed to get decoil to run with my data and produce all the right files but I am missing the summary.txt file This file was produced with the test data but not with my own data. (I used singularity to work with the docker image as a sif file)

The command I used was:

export BAM_INPUT="/path/to/${SAMPLE}_mod_aligned.bam" export OUTPUT_FOLDER="/path/to/${SAMPLE}T" export NAME="${SAMPLE}T" export GENOME="path/to/reference.fa" export ANNO="path/to/anno.gtf"

singularity run \ -B $PWD:$HOME \ -B ${BAM_INPUT}:/data/input.bam \ -B ${BAM_INPUT}.bai:/data/input.bam.bai \ -B ${GENOME}:/annotation/reference.fa \ -B ${ANNO}:/annotation/anno.gtf \ -B ${OUTPUT_FOLDER}:/output \ decoil.sif decoil -f sv-reconstruct \ -b /data/input.bam \ -r /annotation/reference.fa \ -g /annotation/anno.gtf \ -o /output -n ${NAME}

madagiurgiu25 commented 8 months ago

Dear @eesiribloom,

Currently Decoil was not tested with singularity. This is something I am working on. Do you have any error? Are you sure you use absolute paths for $ANNO and $GENOME?

Best, Madalina

eesiribloom commented 8 months ago

Actually yes, when I look at the snakemake log file for a run with my data I get this: Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count


all 1 coverage 1 decoil 1 survivor 1 svcalling 1 total 5

Select jobs to execute...

[Tue Feb 20 13:28:38 2024] rule coverage: input: /data/input.bam, /data/input.bam.bai output: /output/coverage.bw log: /output/logs/logs_cov jobid: 3 reason: Forced execution wildcards: dirname=/output resources: tmpdir=/tmp

        bamCoverage --bam /data/input.bam -o /output/coverage.bw --binSize 50 -p 4 &> /output/logs/logs_cov

[Tue Feb 20 17:45:45 2024] Finished job 3. 1 of 5 steps (20%) done Select jobs to execute...

[Tue Feb 20 17:45:45 2024] rule svcalling: input: /data/input.bam output: /output/sv.sniffles.vcf log: /output/logs/logs_sniffles jobid: 1 reason: Forced execution wildcards: dirname=/output resources: tmpdir=/tmp

        sniffles -t 4 -m /data/input.bam             -v /output/sv.sniffles.vcf --min_homo_af 0.7             --min_het_af 0.1 --min_length 50             --cluster --genotype --min_support 4 --report-seq &> /output/logs/logs_sniffles

[Wed Feb 21 05:59:51 2024] Finished job 1. 2 of 5 steps (40%) done Select jobs to execute...

[Wed Feb 21 05:59:51 2024] rule decoil: input: /output/sv.sniffles.vcf, /output/coverage.bw, /data/input.bam.bai output: /output/summary.txt, /output/reconstruct.ecDNA.filtered.bed, /output/reconstruct.ecDNA.filtered.fasta log: /output/logs/logs_decoil jobid: 4 reason: Missing output files: /output/reconstruct.ecDNA.filtered.bed, /output/reconstruct.ecDNA.filtered.fasta, /output/summary.txt; Input files updated by another job: /output/sv.sniffles.vcf, /output/coverage.bw wildcards: dirname=/output resources: tmpdir=/tmp

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-02-20T132836.395335.snakemake.log

eesiribloom commented 8 months ago

I dont understand what exactly the error is and why it only occurs with my sample data and not the test data. Could it be to do with using singularity? I did use absolute paths for all the variables. The incomplete set of files I get output from my own data are: clean.vcf config.json coverage.bw fragments_clean_1.bed fragments_clean_2.bed fragments_clean_3.bed fragments_initial.bed graph.gpickle logs metrics_frag_cov_distribution.png metrics_frag_len_cov_correlation.png metrics_frag_len_distribution.png reconstruct.bed reconstruct.bed_debug reconstruct.ecDNA.bed reconstruct.json reconstruct.links.txt_debug sv.sniffles.bedpe sv.sniffles.vcf

madagiurgiu25 commented 8 months ago

Dear @eesiribloom,

I have been looking into the issue with singularity. The mount point /examples is not writable in singularity, which is different to docker, where this folder has write permissions. Forsingularity only specialised folders have write permission. One of them being /mnt. Try:

# singularity needs to create upfront the output directory which will be mounted into the container
mkdir -p $PWD/test3

# run decoil-pipeline using singularity in `sv-reconstruct` mode
singularity run \
    --bind $PWD/test3:/mnt \
    --bind $PWD/$GTFANNO:/annotation/anno.gtf \
    --bind $PWD/$REFGENOME:/annotation/reference.fa \
    decoil.sif decoil -f sv-reconstruct \
    --bam /examples/ecdna1/map.bam \
    --reference-genome /annotation/reference.fa \
    --annotation-gtf /annotation/anno.gtf \
    --outputdir /mnt \
    --name ecdna1
eesiribloom commented 8 months ago

Apologies, I have tried again with singularity using this command on my own data:

export BAM_INPUT="/absolute/path/to/${SAMPLE}_mod_aligned.bam"
export OUTPUT_FOLDER="/absolute/path/to/SMG-${SAMPLE}T"
export NAME="SMG-${SAMPLE}T"
export GENOME="/absolute/path/to/reference.fa"
export ANNO="/absolute/path/to/anno.gtf"

mkdir -p /absolute/path/to/SMG-${SAMPLE}T
mkdir -p /absolute/path/to/SMG-${SAMPLE}T/logs
mkdir -p /absolute/path/to/SMG-${SAMPLE}T/tmp
singularity run \
-B $PWD:$HOME \
-B ${BAM_INPUT}:/data/input.bam \
-B ${BAM_INPUT}.bai:/data/input.bam.bai \
-B ${GENOME}:/annotation/reference.fa \
-B ${ANNO}:/annotation/anno.gtf \
-B ${OUTPUT_FOLDER}:/mnt \
-B ${OUTPUT_FOLDER}/logs:/mnt/logs \
-B ${OUTPUT_FOLDER}/tmp:/tmp \
decoil.sif decoil sv-reconstruct \
-b /data/input.bam \
-r /annotation/reference.fa \
-g /annotation/anno.gtf \
-o /mnt \
--name ${NAME}

the files produced are the same as above and my log file is: INFO: Convert SIF file to sandbox... During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" 5: Setting LC_PAPER failed, using "C" 6: Setting LC_MEASUREMENT failed, using "C" Found workflowfile at /opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/cli/sv-reconstruct.json Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count


all 1 coverage 1 decoil 1 survivor 1 svcalling 1 total 5

Select jobs to execute...

[Mon Mar 25 14:33:22 2024] rule svcalling: input: /data/input.bam output: /mnt/sv.sniffles.vcf log: /mnt/logs/logs_sniffles jobid: 1 reason: Missing output files: /mnt/sv.sniffles.vcf wildcards: dirname=/mnt resources: tmpdir=/tmp

        sniffles -t 4 -m /data/input.bam             -v /mnt/sv.sniffles.vcf --min_homo_af 0.7             --min_het_af 0.1 --min_length 50             --cluster --genotype --min_support 4 --report-seq &> /mnt/logs/logs_sniffles

[Mon Mar 25 16:45:33 2024] Finished job 1. 1 of 5 steps (20%) done Select jobs to execute...

[Mon Mar 25 16:45:33 2024] rule survivor: input: /mnt/sv.sniffles.vcf output: /mnt/sv.sniffles.bedpe log: /mnt/logs/logs_survivor jobid: 2 reason: Missing output files: /mnt/sv.sniffles.bedpe; Input files updated by another job: /mnt/sv.sniffles.vcf wildcards: dirname=/mnt resources: tmpdir=/tmp

        SURVIVOR vcftobed /mnt/sv.sniffles.vcf -1 -1 /mnt/sv.sniffles.bedpe &> /mnt/logs/logs_survivor

[Mon Mar 25 16:45:34 2024] Finished job 2. 2 of 5 steps (40%) done Select jobs to execute...

[Mon Mar 25 16:45:34 2024] rule coverage: input: /data/input.bam, /data/input.bam.bai output: /mnt/coverage.bw log: /mnt/logs/logs_cov jobid: 3 reason: Missing output files: /mnt/coverage.bw wildcards: dirname=/mnt resources: tmpdir=/tmp

        bamCoverage --bam /data/input.bam -o /mnt/coverage.bw --binSize 50 -p 4 &> /mnt/logs/logs_cov

[Mon Mar 25 17:39:23 2024] Finished job 3. 3 of 5 steps (60%) done Select jobs to execute...

[Mon Mar 25 17:39:23 2024] rule decoil: input: /mnt/sv.sniffles.vcf, /mnt/coverage.bw, /data/input.bam.bai output: /mnt/summary.txt, /mnt/reconstruct.ecDNA.filtered.bed, /mnt/reconstruct.ecDNA.filtered.fasta log: /mnt/logs/logs_decoil jobid: 4 reason: Missing output files: /mnt/reconstruct.ecDNA.filtered.fasta, /mnt/summary.txt, /mnt/reconstruct.ecDNA.filtered.bed; Input files updated by another job: /mnt/sv.sniffles.vcf, /mnt/coverage.bw wildcards: dirname=/mnt resources: tmpdir=/tmp

During startup - Warning messages: 1: Setting LC_COLLATE failed, using "C" 2: Setting LC_TIME failed, using "C" 3: Setting LC_MESSAGES failed, using "C" 4: Setting LC_MONETARY failed, using "C" 5: Setting LC_PAPER failed, using "C" 6: Setting LC_MEASUREMENT failed, using "C" Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Select jobs to execute... [Mon Mar 25 17:49:31 2024] Error in rule decoil: jobid: 0 input: /mnt/sv.sniffles.vcf, /mnt/coverage.bw, /data/input.bam.bai output: /mnt/summary.txt, /mnt/reconstruct.ecDNA.filtered.bed, /mnt/reconstruct.ecDNA.filtered.fasta log: /mnt/logs/logs_decoil (check log file(s) for error details)

RuleException: KeyError in file /opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/cli/Snakefile, line 63: '10' File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/cli/Snakefile", line 63, in __rule_decoil File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/main.py", line 350, in run_reconstruction File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/main.py", line 204, in run_save_v2_new File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/output/parse.py", line 220, in convert_bed2fasta File "/opt/conda/envs/envdecoil/lib/python3.8/concurrent/futures/thread.py", line 57, in run Removing output files of failed job decoil since they might be corrupted: /mnt/reconstruct.ecDNA.filtered.bed Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-03-25T143320.373124.snakemake.log Traceback (most recent call last): File "/opt/conda/envs/envdecoil/lib/python3.8/site-packages/decoil-1.1.1-py3.8.egg/decoil/command.py", line 203, in main raise Exception("Snakemake failed") Exception: Snakemake failed INFO: Cleaning up image...

madagiurgiu25 commented 8 months ago

Can you please check if the $GENOME matches the reference genome which was used for the alignment? In one you might use in the .bam file "chr10", but in GENOME "10" as annotation for chromosome 10 (or the other way around).

eesiribloom commented 8 months ago

Thanks for your advice, it has worked completely now that I made sure to use the same reference genome.

madagiurgiu25 commented 8 months ago

@eesiribloom, I will close the issue but if you have other questions, please open another issue.