I have been interested in using your pipeline with ONT long-read data with UMIs that are generated by colleagues in my lab.
To begin with, I tried to execute the pipeline with the example data that are provided in the repository. I followed the instructions provided in the README, that is:
Clone the repository:
git clone git@github.com:camcl/pipeline-umi-amplicon.git
Navigate to the cloned repository and finish the configuration and installation. I used the latest miniconda3:
cd pipeline-umi-amplicon
conda env create -f environment.yml
conda activate pipeline-umi-amplicon
cd lib && pip install . && cd ..
This ran without error and I have the following components in the conda environment:
Testing the installation with snakemake -j 1 -pr --configfile config.yml does not produce any error:
Targets: EGFR_917
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
------------------ -------
copy_bed 1
reads 1
seqkit_bam_acc_tsv 1
total 3
4. Without editing anything in `config.yml`, I ran the command `snakemake -j 30 reads --configfile config.yml`. All steps until the rule `polish clusters` complete but the execution terminates upon polishing with the following output:
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-09-17T164240.824646.snakemake.log
The contents of the file `example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log` provide more information about the error:
Traceback (most recent call last):
File "~/miniconda3/envs/pipeline-umi-amplicon/bin/medaka", line 11, in
sys.exit(main())
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/medaka.py", line 814, in main
args.func(args)
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/smolecule.py", line 429, in main
medaka.common.mkdir_p(args.output, info='Results will be overwritten.')
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/common.py", line 763, in mkdir_p
os.makedirs(path)
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa'
Hi,
I have been interested in using your pipeline with ONT long-read data with UMIs that are generated by colleagues in my lab. To begin with, I tried to execute the pipeline with the example data that are provided in the repository. I followed the instructions provided in the README, that is:
Clone the repository:
git clone git@github.com:camcl/pipeline-umi-amplicon.git
Navigate to the cloned repository and finish the configuration and installation. I used the latest miniconda3:
This ran without error and I have the following components in the conda environment:
Testing the installation with
snakemake -j 1 -pr --configfile config.yml
does not produce any error:Select jobs to execute...
[Tue Sep 17 16:20:28 2024] rule copy_bed: input: data/example_egfr_amplicon.bed output: example_egfr_single_read_run/targets.bed jobid: 1 reason: Missing output files: example_egfr_single_read_run/targets.bed wildcards: name=example_egfr_single_read_run resources: tmpdir=/tmp
cp data/example_egfr_amplicon.bed example_egfr_single_read_run/targets.bed [Tue Sep 17 16:20:28 2024] Finished job 1. 1 of 3 steps (33%) done Select jobs to execute...
[Tue Sep 17 16:20:28 2024] rule seqkit_bam_acc_tsv: input: example_egfr_single_read_run/align/EGFR_917_consensus.bam output: example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv jobid: 13 reason: Missing output files: example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv wildcards: name=example_egfr_single_read_run, target=EGFR_917, stage=consensus resources: tmpdir=/tmp
[Tue Sep 17 16:20:29 2024] Finished job 13. 2 of 3 steps (67%) done Select jobs to execute...
[Tue Sep 17 16:20:29 2024] localrule reads: input: example_egfr_single_read_run/targets.bed, example_egfr_single_read_run/align/EGFR_917_final.bam.bai, example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv, example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv jobid: 0 reason: Input files updated by another job: example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv, example_egfr_single_read_run/targets.bed resources: tmpdir=/tmp
[Tue Sep 17 16:20:29 2024] Finished job 0. 3 of 3 steps (100%) done Complete log: .snakemake/log/2024-09-17T162028.348925.snakemake.log
[Tue Sep 17 16:42:44 2024] Error in rule polish_clusters: jobid: 6 input: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa output: example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp, example_egfr_single_read_run/fasta/EGFR_917_consensus.bam, example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta shell:
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-09-17T164240.824646.snakemake.log
Traceback (most recent call last): File "~/miniconda3/envs/pipeline-umi-amplicon/bin/medaka", line 11, in
sys.exit(main())
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/medaka.py", line 814, in main
args.func(args)
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/smolecule.py", line 429, in main
medaka.common.mkdir_p(args.output, info='Results will be overwritten.')
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/common.py", line 763, in mkdir_p
os.makedirs(path)
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa'