yanhui09 / laca

A reproducible and scalable workflow for Long Amplicon Consensus Analysis (LACA)
GNU General Public License v3.0
7 stars 2 forks source link

Test data #4

Closed AlessioMilanese closed 8 months ago

AlessioMilanese commented 9 months ago

Hello,

Thanks for providing this tool.

I was wondering if you could provide a test dataset with few fastq files and a database file. I'm having some problems running the tool using the docker container, and I wanted to know if it's an issue with my input or I'm doing something wrong.

In case you don't have a ftp server, you could use https://zenodo.org/ to store the files.

yanhui09 commented 9 months ago

Hi

You can find a test data in the repository, https://github.com/yanhui09/laca/blob/master/laca/workflow/resources/data/raw.fastq.gz .

If you set the basecalled_dir to the path/to/laca/workflow/resources/data in the config.yaml, you could do a demo run.

BTW, you probably need to set the n (under seqkit config parameters) to e.g., 1000 if you do subsampling on the test data.

AlessioMilanese commented 9 months ago

Ok, thanks for clarifying.

So I have to run:

# unzip 
gunzip raw.fastq.gz
# init config file and check
laca init -b raw.fastq -d /path/to/database
# start analysis
laca run all

Could you provide me a "database" for the -d option? Is it a fasta file or a directory?

yanhui09 commented 9 months ago

you don't need to gunzip the fast files. laca init will create a config.yaml under your working directory (. by default). And a defined config.yaml could be re-used in other runs. With the generated config.yaml, you can control the params to run laca.

laca use conda or docker (if required special environment) to control the software version. For the first use, it will download the software and database by itself, so it takes some time. -d in the lace init just defines the where you want to store the software and databases. You can set it anywhere as long as you have the permission. -b corresponds the directory holding the basecalled fastq or fastq.gz files (Fastq files were generated in batches by guppy). You need to make it to the path to a directory.

In short, it shall be like

laca init -b /path/to/raw.fastq.gz -d /path/to/database
#check the parameters in the generated config file
laca run all
AlessioMilanese commented 9 months ago

Not sure if I'm doing something wrong. Here's what I'm running.

I have the file in the repo tmp_fastq:

alessiomilanese:tmp$ docker run -v `pwd`:/home --privileged yanhui09/laca ls tmp_fastq
raw.fastq.gz

I run the init:

alessiomilanese:tmp$ docker run -v `pwd`:/home --privileged yanhui09/laca laca init -b /home/tmp_fastq -d tmp_db
2024-02-01 10:33:28,381 - root - INFO - LACA version: 0+untagged.1.g3131263 (laca.py:436)
2024-02-01 10:33:28,410 - root - INFO - Config file [config.yaml] created in /home. (config.py:196)

and run all:

alessiomilanese:tmp$ docker run -v `pwd`:/home --privileged yanhui09/laca laca run all
2024-02-01 10:33:38,513 - root - INFO - LACA version: 0+untagged.1.g3131263 (laca.py:61)
2024-02-01 10:33:38,538 - root - DEBUG - Executing: snakemake all --directory '/home' --snakefile '/tmp/repo/laca/workflow/Snakefile' --configfile '/home/config.yaml' --use-conda --conda-prefix '/home/tmp_db/conda_envs' --use-singularity --singularity-prefix '/home/tmp_db/singularity_envs' --singularity-args '--bind /tmp/repo/laca/workflow/resources/guppy_barcoding/:/opt/ont/guppy/data/barcoding/,/home/tmp_fastq'  --rerun-triggers mtime --rerun-incomplete --scheduler greedy --jobs 6 --nolock   --resources mem=957 mem_mb=980158 java_mem=813       (laca.py:105)
Config file config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Pulling singularity image docker://genomicpariscentre/guppy:3.3.3.
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Creating conda environment ../tmp/repo/laca/workflow/envs/mmseqs2.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/mmseqs2.yaml created (location: tmp_db/conda_envs/9f7ce0c287dd42c65e27d60a5610c12a_)
Creating conda environment ../tmp/repo/laca/workflow/envs/cutadapt.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/cutadapt.yaml created (location: tmp_db/conda_envs/6ff289ee8fcb2a9d8152199a50d587b9_)
Creating conda environment ../tmp/repo/laca/workflow/envs/isONcorCon.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/isONcorCon.yaml created (location: tmp_db/conda_envs/7724e5612be64a299e2f41b61b024020_)
Creating conda environment ../tmp/repo/laca/workflow/envs/q2plugs.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/q2plugs.yaml created (location: tmp_db/conda_envs/8adeba28bea1ec0ded02293db943f5e2_)
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Provided resources: mem=957, mem_mb=980158, java_mem=813
Job stats:
job                      count    min threads    max threads
---------------------  -------  -------------  -------------
all                          1              1              1
check_primers_repseqs        1              2              2
cls_isONclust                1              1              1
cls_kmerCon                  1              1              1
cls_meshclust                1              1              1
col_q2blast_batch            1              1              1
collect_consensus            1              1              1
combine_cls                  1              1              1
combine_fastq                1              1              1
count_matrix                 1              1              1
demux_check                  1              1              1
drep_consensus               1              6              6
exclude_empty_fqs            1              1              1
get_taxonomy                 1              1              1
get_tree                     1              1              1
guppy                        1              6              6
isONclust                    1              6              6
matrix_seqid                 1              1              1
q2_fasttree                  1              6              6
q2_repseqs                   1              1              1
q2export_tree                1              1              1
rename_drep_seqs             1              1              1
repseqs_split                1              1              1
total                       23              1              6

Select jobs to execute...

[Thu Feb  1 10:35:19 2024]
localrule guppy:
    output: demux_guppy
    log: logs/demultiplex/guppy.log
    jobid: 9
    benchmark: benchmarks/demultiplex/guppy.txt
    reason: Missing output files: demux_guppy
    threads: 6
    resources: tmpdir=/tmp, mem=50

Activating singularity image /home/tmp_db/singularity_envs/be79a9f6f5e87678ce46ad686c92cb19.simg
ONT Guppy barcoding software version 3.3.3+fa743a6
input path:         /home/tmp_fastq
save path:          demux_guppy
arrangement files:  barcode_arrs_16S-GXO192.cfg
min. score front:   60
min. score rear:    60

Found 1 fastq files.

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Done in 157921 ms.
[Thu Feb  1 10:37:57 2024]
Finished job 9.
1 of 23 steps (4%) done
Select jobs to execute...

[Thu Feb  1 10:37:57 2024]
localcheckpoint demux_check:
    input: demux_guppy
    output: demultiplexed
    log: logs/demultiplex/check.log
    jobid: 8
    benchmark: benchmarks/demultiplex/check.txt
    reason: Missing output files: demultiplexed; Input files updated by another job: demux_guppy
    resources: tmpdir=/tmp
DAG of jobs will be updated after completion.

[Thu Feb  1 10:37:57 2024]
Finished job 8.
2 of 23 steps (9%) done
Creating conda environment ../tmp/repo/laca/workflow/envs/yacrd.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/yacrd.yaml created (location: tmp_db/conda_envs/2076c028367dc304ff8332a2dc20dc22_)
Select jobs to execute...

[Thu Feb  1 10:37:59 2024]
localrule collect_fastq:
    input: demultiplexed/BRK13
    output: qc/BRK13.fastq
    log: logs/demultiplex/collect_fastq/BRK13.log
    jobid: 40
    benchmark: benchmarks/demultiplex/collect_fastq/BRK13.txt
    reason: Missing output files: qc/BRK13.fastq
    wildcards: barcode=BRK13
    resources: tmpdir=/tmp

[Thu Feb  1 10:37:59 2024]
Finished job 40.
3 of 30 steps (10%) done
Select jobs to execute...

[Thu Feb  1 10:37:59 2024]
rule check_primers:
    input: qc/BRK13.fastq
    output: qc/primers_passed/BRK13F.fastq, qc/primers_unpassed/BRK13F.fastq
    log: logs/qc/check_primersF/BRK13.log
    jobid: 39
    benchmark: benchmarks/qc/check_primersF/BRK13.txt
    reason: Missing output files: qc/primers_unpassed/BRK13F.fastq, qc/primers_passed/BRK13F.fastq; Input files updated by another job: qc/BRK13.fastq
    wildcards: barcode=BRK13
    threads: 6
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/6ff289ee8fcb2a9d8152199a50d587b9_
[Thu Feb  1 10:38:00 2024]
Finished job 39.
4 of 30 steps (13%) done
Removing temporary output qc/BRK13.fastq.
Select jobs to execute...

[Thu Feb  1 10:38:00 2024]
rule check_primersR:
    input: qc/primers_unpassed/BRK13F.fastq
    output: qc/primers_passed/BRK13R.fastq, qc/primers_unpassed/BRK13.fastq
    log: logs/qc/check_primersR/BRK13.log
    jobid: 41
    benchmark: benchmarks/qc/check_primersR/BRK13.txt
    reason: Missing output files: qc/primers_passed/BRK13R.fastq; Input files updated by another job: qc/primers_unpassed/BRK13F.fastq
    wildcards: barcode=BRK13
    threads: 6
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/6ff289ee8fcb2a9d8152199a50d587b9_
[Thu Feb  1 10:38:01 2024]
Finished job 41.
5 of 30 steps (17%) done
Removing temporary output qc/primers_unpassed/BRK13F.fastq.
Removing temporary output qc/primers_unpassed/BRK13.fastq.
Select jobs to execute...

[Thu Feb  1 10:38:01 2024]
rule revcomp_fq_combine:
    input: qc/primers_passed/BRK13F.fastq, qc/primers_passed/BRK13R.fastq
    output: qc/primers_passed/BRK13R_revcomp.fastq, qc/primers_passed/BRK13.fastq
    jobid: 38
    reason: Missing output files: qc/primers_passed/BRK13.fastq; Input files updated by another job: qc/primers_passed/BRK13R.fastq, qc/primers_passed/BRK13F.fastq
    wildcards: barcode=BRK13
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

[Thu Feb  1 10:38:01 2024]
Finished job 38.
6 of 30 steps (20%) done
Removing temporary output qc/primers_passed/BRK13F.fastq.
Removing temporary output qc/primers_passed/BRK13R.fastq.
Removing temporary output qc/primers_passed/BRK13R_revcomp.fastq.
Select jobs to execute...

[Thu Feb  1 10:38:01 2024]
rule minimap2ava_yacrd:
    input: qc/primers_passed/BRK13.fastq
    output: qc/yacrd/BRK13.paf
    log: logs/qc/yacrd/BRK13_ava.log
    jobid: 42
    benchmark: benchmarks/qc/yacrd/BRK13_ava.txt
    reason: Missing output files: qc/yacrd/BRK13.paf; Input files updated by another job: qc/primers_passed/BRK13.fastq
    wildcards: barcode=BRK13
    threads: 6
    resources: tmpdir=/tmp, mem=50, time=1

Activating conda environment: tmp_db/conda_envs/2076c028367dc304ff8332a2dc20dc22_
[Thu Feb  1 10:38:05 2024]
Finished job 42.
7 of 30 steps (23%) done
Select jobs to execute...

[Thu Feb  1 10:38:05 2024]
rule yacrd:
    input: qc/primers_passed/BRK13.fastq, qc/yacrd/BRK13.paf
    output: qc/yacrd/BRK13.fastq
    log: logs/qc/yacrd/BRK13_filter.log
    jobid: 37
    benchmark: benchmarks/qc/yacrd/BRK13_filter.txt
    reason: Missing output files: qc/yacrd/BRK13.fastq; Input files updated by another job: qc/yacrd/BRK13.paf, qc/primers_passed/BRK13.fastq
    wildcards: barcode=BRK13
    threads: 6
    resources: tmpdir=/tmp, mem=50, time=1

Activating conda environment: tmp_db/conda_envs/2076c028367dc304ff8332a2dc20dc22_
[Thu Feb  1 10:38:06 2024]
Finished job 37.
8 of 30 steps (27%) done
Removing temporary output qc/primers_passed/BRK13.fastq.
Removing temporary output qc/yacrd/BRK13.paf.
Select jobs to execute...

[Thu Feb  1 10:38:06 2024]
rule q_filter:
    input: qc/yacrd/BRK13.fastq
    output: qc/qfilt/BRK13.fastq
    jobid: 36
    reason: Missing output files: qc/qfilt/BRK13.fastq; Input files updated by another job: qc/yacrd/BRK13.fastq
    wildcards: barcode=BRK13
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

[Thu Feb  1 10:38:06 2024]
Finished job 36.
9 of 30 steps (30%) done
Removing temporary output qc/yacrd/BRK13.fastq.
Select jobs to execute...

[Thu Feb  1 10:38:06 2024]
localcheckpoint exclude_empty_fqs:
    input: qc/qfilt/BRK13.fastq
    output: .qc_DONE
    jobid: 7
    reason: Missing output files: .qc_DONE; Input files updated by another job: qc/qfilt/BRK13.fastq
    resources: tmpdir=/tmp
DAG of jobs will be updated after completion.

Touching output file .qc_DONE.
[Thu Feb  1 10:38:06 2024]
Finished job 7.
10 of 30 steps (33%) done
Select jobs to execute...

[Thu Feb  1 10:38:06 2024]
localrule combine_fastq:
    input: qc/qfilt/BRK13.fastq
    output: qc/qfilt/pooled.fastq
    jobid: 12
    reason: Missing output files: qc/qfilt/pooled.fastq
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:06 2024]
Finished job 12.
11 of 30 steps (37%) done
Select jobs to execute...

[Thu Feb  1 10:38:06 2024]
rule isONclust:
    input: qc/qfilt/pooled.fastq
    output: clust/isONclust/pooled, clust/isONclust/pooled.tsv
    log: logs/clust/isONclust/pooled.log
    jobid: 11
    benchmark: benchmarks/clust/isONclust/pooled.txt
    reason: Missing output files: clust/isONclust/pooled.tsv; Input files updated by another job: qc/qfilt/pooled.fastq
    wildcards: barcode=pooled
    threads: 6
    resources: tmpdir=/tmp, mem=50, time=30

Activating conda environment: tmp_db/conda_envs/7724e5612be64a299e2f41b61b024020_
[Thu Feb  1 10:38:08 2024]
Finished job 11.
12 of 30 steps (40%) done
Removing temporary output clust/isONclust/pooled.
Select jobs to execute...

[Thu Feb  1 10:38:08 2024]
localcheckpoint cls_isONclust:
    input: qc/qfilt/BRK13.fastq, clust/isONclust/pooled.tsv
    output: clust/isONclust/read2cluster
    jobid: 10
    reason: Missing output files: clust/isONclust/read2cluster; Input files updated by another job: clust/isONclust/pooled.tsv, qc/qfilt/BRK13.fastq
    resources: tmpdir=/tmp
DAG of jobs will be updated after completion.

Config file config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Thu Feb  1 10:38:09 2024]
Finished job 10.
13 of 30 steps (43%) done
Pulling singularity image docker://yanhui09/identity:latest.
Select jobs to execute...

[Thu Feb  1 10:38:52 2024]
localrule fqs_split_isONclust:
    input: clust/isONclust/read2cluster/pooled_1.csv, qc/qfilt/pooled.fastq
    output: clust/isONclust/pooled_1.split.fastq
    jobid: 65
    reason: Missing output files: clust/isONclust/pooled_1.split.fastq
    wildcards: barcode=pooled, c1=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
localrule fqs_split_isONclust:
    input: clust/isONclust/read2cluster/pooled_2.csv, qc/qfilt/pooled.fastq
    output: clust/isONclust/pooled_2.split.fastq
    jobid: 69
    reason: Missing output files: clust/isONclust/pooled_2.split.fastq
    wildcards: barcode=pooled, c1=2
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
localrule fqs_split_isONclust:
    input: clust/isONclust/read2cluster/pooled_0.csv, qc/qfilt/pooled.fastq
    output: clust/isONclust/pooled_0.split.fastq
    jobid: 61
    reason: Missing output files: clust/isONclust/pooled_0.split.fastq
    wildcards: barcode=pooled, c1=0
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
Finished job 69.
14 of 42 steps (33%) done
Select jobs to execute...

[Thu Feb  1 10:38:52 2024]
localrule fqs_split_isONclust2:
    input: clust/isONclust/pooled_2.split.fastq
    output: clust/isONclust/split/pooled_2_0.fastq
    jobid: 68
    reason: Missing output files: clust/isONclust/split/pooled_2_0.fastq; Input files updated by another job: clust/isONclust/pooled_2.split.fastq
    wildcards: barcode=pooled, c1=2
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
Finished job 65.
15 of 42 steps (36%) done
Select jobs to execute...

[Thu Feb  1 10:38:52 2024]
localrule fqs_split_isONclust2:
    input: clust/isONclust/pooled_1.split.fastq
    output: clust/isONclust/split/pooled_1_0.fastq
    jobid: 64
    reason: Missing output files: clust/isONclust/split/pooled_1_0.fastq; Input files updated by another job: clust/isONclust/pooled_1.split.fastq
    wildcards: barcode=pooled, c1=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
Finished job 68.
16 of 42 steps (38%) done
Removing temporary output clust/isONclust/pooled_2.split.fastq.
Select jobs to execute...

[Thu Feb  1 10:38:52 2024]
localrule fq2fa4meshclust:
    input: clust/isONclust/split/pooled_2_0.fastq
    output: clust/meshclust/pooled_2_0.fasta
    jobid: 67
    reason: Missing output files: clust/meshclust/pooled_2_0.fasta; Input files updated by another job: clust/isONclust/split/pooled_2_0.fastq
    wildcards: barcode=pooled, c1=2, c2=0
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
Finished job 64.
17 of 42 steps (40%) done
Removing temporary output clust/isONclust/pooled_1.split.fastq.
Select jobs to execute...

[Thu Feb  1 10:38:52 2024]
localrule fq2fa4meshclust:
    input: clust/isONclust/split/pooled_1_0.fastq
    output: clust/meshclust/pooled_1_0.fasta
    jobid: 63
    reason: Missing output files: clust/meshclust/pooled_1_0.fasta; Input files updated by another job: clust/isONclust/split/pooled_1_0.fastq
    wildcards: barcode=pooled, c1=1, c2=0
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
Finished job 61.
18 of 42 steps (43%) done
Select jobs to execute...

[Thu Feb  1 10:38:52 2024]
localrule fqs_split_isONclust2:
    input: clust/isONclust/pooled_0.split.fastq
    output: clust/isONclust/split/pooled_0_0.fastq
    jobid: 60
    reason: Missing output files: clust/isONclust/split/pooled_0_0.fastq; Input files updated by another job: clust/isONclust/pooled_0.split.fastq
    wildcards: barcode=pooled, c1=0
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
Finished job 67.
19 of 42 steps (45%) done
Removing temporary output clust/isONclust/split/pooled_2_0.fastq.
Select jobs to execute...
[Thu Feb  1 10:38:52 2024]
Finished job 60.
20 of 42 steps (48%) done
Removing temporary output clust/isONclust/pooled_0.split.fastq.

[Thu Feb  1 10:38:52 2024]
localrule fq2fa4meshclust:
    input: clust/isONclust/split/pooled_0_0.fastq
    output: clust/meshclust/pooled_0_0.fasta
    jobid: 59
    reason: Missing output files: clust/meshclust/pooled_0_0.fasta; Input files updated by another job: clust/isONclust/split/pooled_0_0.fastq
    wildcards: barcode=pooled, c1=0, c2=0
    resources: tmpdir=/tmp

[Thu Feb  1 10:38:52 2024]
Finished job 63.
21 of 42 steps (50%) done
Removing temporary output clust/isONclust/split/pooled_1_0.fastq.
Select jobs to execute...
[Thu Feb  1 10:38:52 2024]
Finished job 59.
22 of 42 steps (52%) done
Removing temporary output clust/isONclust/split/pooled_0_0.fastq.

[Thu Feb  1 10:38:52 2024]
rule meshclust:
    input: clust/meshclust/pooled_0_0.fasta
    output: clust/meshclust/pooled_0_0.tsv
    log: logs/clust/meshclust/pooled_0_0.log
    jobid: 58
    benchmark: benchmarks/mechclust/pooled_0_0.txt
    reason: Missing output files: clust/meshclust/pooled_0_0.tsv; Input files updated by another job: clust/meshclust/pooled_0_0.fasta
    wildcards: barcode=pooled, c1=0, c2=0
    threads: 6
    resources: tmpdir=/tmp, mem=50, time=30

Activating singularity image /home/tmp_db/singularity_envs/1f35c0780be107844efa2707b35541ca.simg
[Thu Feb  1 10:39:07 2024]
Finished job 58.
23 of 42 steps (55%) done
Removing temporary output clust/meshclust/pooled_0_0.fasta.
Select jobs to execute...

[Thu Feb  1 10:39:07 2024]
rule meshclust:
    input: clust/meshclust/pooled_1_0.fasta
    output: clust/meshclust/pooled_1_0.tsv
    log: logs/clust/meshclust/pooled_1_0.log
    jobid: 62
    benchmark: benchmarks/mechclust/pooled_1_0.txt
    reason: Missing output files: clust/meshclust/pooled_1_0.tsv; Input files updated by another job: clust/meshclust/pooled_1_0.fasta
    wildcards: barcode=pooled, c1=1, c2=0
    threads: 6
    resources: tmpdir=/tmp, mem=50, time=30

Activating singularity image /home/tmp_db/singularity_envs/1f35c0780be107844efa2707b35541ca.simg
[Thu Feb  1 10:39:16 2024]
Finished job 62.
24 of 42 steps (57%) done
Removing temporary output clust/meshclust/pooled_1_0.fasta.
Select jobs to execute...

[Thu Feb  1 10:39:16 2024]
rule meshclust:
    input: clust/meshclust/pooled_2_0.fasta
    output: clust/meshclust/pooled_2_0.tsv
    log: logs/clust/meshclust/pooled_2_0.log
    jobid: 66
    benchmark: benchmarks/mechclust/pooled_2_0.txt
    reason: Missing output files: clust/meshclust/pooled_2_0.tsv; Input files updated by another job: clust/meshclust/pooled_2_0.fasta
    wildcards: barcode=pooled, c1=2, c2=0
    threads: 6
    resources: tmpdir=/tmp, mem=50, time=30

Activating singularity image /home/tmp_db/singularity_envs/1f35c0780be107844efa2707b35541ca.simg
[Thu Feb  1 10:39:21 2024]
Finished job 66.
25 of 42 steps (60%) done
Removing temporary output clust/meshclust/pooled_2_0.fasta.
Select jobs to execute...

[Thu Feb  1 10:39:21 2024]
localcheckpoint cls_meshclust:
    input: .qc_DONE, qc/qfilt/BRK13.fastq, clust/meshclust/pooled_0_0.tsv, clust/meshclust/pooled_1_0.tsv, clust/meshclust/pooled_2_0.tsv
    output: clust/clusters
    jobid: 6
    reason: Missing output files: clust/clusters; Input files updated by another job: clust/meshclust/pooled_0_0.tsv, clust/meshclust/pooled_2_0.tsv, clust/meshclust/pooled_1_0.tsv
    resources: tmpdir=/tmp
DAG of jobs will be updated after completion.

Config file config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Thu Feb  1 10:39:23 2024]
Finished job 6.
26 of 42 steps (62%) done
Select jobs to execute...

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_meshclust:
    input: clust/clusters/pooled_0_0_1.csv, clust/clusters/pooled_0_0_1.centroid, qc/qfilt/pooled.fastq
    output: clust/members/pooled_0_0_1.fastq, clust/centroids/pooled_0_0_1.fasta
    jobid: 77
    reason: Missing output files: clust/members/pooled_0_0_1.fastq, clust/centroids/pooled_0_0_1.fasta
    wildcards: barcode=pooled, c1=0, c2=0, c3=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_meshclust:
    input: clust/clusters/pooled_1_0_1.csv, clust/clusters/pooled_1_0_1.centroid, qc/qfilt/pooled.fastq
    output: clust/members/pooled_1_0_1.fastq, clust/centroids/pooled_1_0_1.fasta
    jobid: 76
    reason: Missing output files: clust/members/pooled_1_0_1.fastq, clust/centroids/pooled_1_0_1.fasta
    wildcards: barcode=pooled, c1=1, c2=0, c3=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_meshclust:
    input: clust/clusters/pooled_1_0_2.csv, clust/clusters/pooled_1_0_2.centroid, qc/qfilt/pooled.fastq
    output: clust/members/pooled_1_0_2.fastq, clust/centroids/pooled_1_0_2.fasta
    jobid: 74
    reason: Missing output files: clust/members/pooled_1_0_2.fastq, clust/centroids/pooled_1_0_2.fasta
    wildcards: barcode=pooled, c1=1, c2=0, c3=2
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_meshclust:
    input: clust/clusters/pooled_2_0_1.csv, clust/clusters/pooled_2_0_1.centroid, qc/qfilt/pooled.fastq
    output: clust/members/pooled_2_0_1.fastq, clust/centroids/pooled_2_0_1.fasta
    jobid: 75
    reason: Missing output files: clust/centroids/pooled_2_0_1.fasta, clust/members/pooled_2_0_1.fastq
    wildcards: barcode=pooled, c1=2, c2=0, c3=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
Finished job 76.
27 of 50 steps (54%) done
Select jobs to execute...

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_kmerCon:
    input: clust/members/pooled_1_0_1.fastq, clust/centroids/pooled_1_0_1.fasta
    output: kmerCon/split/pooled_1_0_1cand1.fastq, kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.fna
    jobid: 80
    reason: Missing output files: kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.fna, kmerCon/split/pooled_1_0_1cand1.fastq; Input files updated by another job: clust/members/pooled_1_0_1.fastq, clust/centroids/pooled_1_0_1.fasta
    wildcards: barcode=pooled, c1=1, c2=0, c3=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
Finished job 80.
28 of 50 steps (56%) done
Removing temporary output clust/centroids/pooled_1_0_1.fasta.
[Thu Feb  1 10:39:23 2024]
Finished job 77.
29 of 50 steps (58%) done
Select jobs to execute...

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_kmerCon:
    input: clust/members/pooled_0_0_1.fastq, clust/centroids/pooled_0_0_1.fasta
    output: kmerCon/split/pooled_0_0_1cand1.fastq, kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.fna
    jobid: 81
    reason: Missing output files: kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.fna, kmerCon/split/pooled_0_0_1cand1.fastq; Input files updated by another job: clust/members/pooled_0_0_1.fastq, clust/centroids/pooled_0_0_1.fasta
    wildcards: barcode=pooled, c1=0, c2=0, c3=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
Finished job 81.
30 of 50 steps (60%) done
Removing temporary output clust/centroids/pooled_0_0_1.fasta.
[Thu Feb  1 10:39:23 2024]
Finished job 75.
31 of 50 steps (62%) done
Select jobs to execute...

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_kmerCon:
    input: clust/members/pooled_2_0_1.fastq, clust/centroids/pooled_2_0_1.fasta
    output: kmerCon/split/pooled_2_0_1cand1.fastq, kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.fna
    jobid: 79
    reason: Missing output files: kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.fna, kmerCon/split/pooled_2_0_1cand1.fastq; Input files updated by another job: clust/centroids/pooled_2_0_1.fasta, clust/members/pooled_2_0_1.fastq
    wildcards: barcode=pooled, c1=2, c2=0, c3=1
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
Finished job 74.
32 of 50 steps (64%) done
Select jobs to execute...

[Thu Feb  1 10:39:23 2024]
localrule fqs_split_kmerCon:
    input: clust/members/pooled_1_0_2.fastq, clust/centroids/pooled_1_0_2.fasta
    output: kmerCon/split/pooled_1_0_2cand1.fastq, kmerCon/polish/pooled_1_0_2cand1/minimap2/raw.fna
    jobid: 78
    reason: Missing output files: kmerCon/split/pooled_1_0_2cand1.fastq, kmerCon/polish/pooled_1_0_2cand1/minimap2/raw.fna; Input files updated by another job: clust/members/pooled_1_0_2.fastq, clust/centroids/pooled_1_0_2.fasta
    wildcards: barcode=pooled, c1=1, c2=0, c3=2
    resources: tmpdir=/tmp

[Thu Feb  1 10:39:23 2024]
Finished job 79.
33 of 50 steps (66%) done
Removing temporary output clust/centroids/pooled_2_0_1.fasta.
[Thu Feb  1 10:39:23 2024]
Finished job 78.
34 of 50 steps (68%) done
Removing temporary output clust/centroids/pooled_1_0_2.fasta.
Select jobs to execute...

[Thu Feb  1 10:39:23 2024]
checkpoint cls_kmerCon:
    input: clust/clusters, .qc_DONE, qc/qfilt/BRK13.fastq, clust/members/pooled_1_0_2.fastq, clust/members/pooled_2_0_1.fastq, clust/members/pooled_1_0_1.fastq, clust/members/pooled_0_0_1.fastq, kmerCon/split/pooled_1_0_2cand1.fastq, kmerCon/split/pooled_2_0_1cand1.fastq, kmerCon/split/pooled_1_0_1cand1.fastq, kmerCon/split/pooled_0_0_1cand1.fastq, kmerCon/polish/pooled_1_0_2cand1/minimap2/raw.fna, kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.fna, kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.fna, kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.fna, clust/clusters/pooled_1_0_2.csv, clust/clusters/pooled_2_0_1.csv, clust/clusters/pooled_1_0_1.csv, clust/clusters/pooled_0_0_1.csv
    output: kmerCon/clusters
    jobid: 5
    reason: Missing output files: kmerCon/clusters; Input files updated by another job: kmerCon/split/pooled_1_0_2cand1.fastq, clust/members/pooled_1_0_1.fastq, clust/members/pooled_0_0_1.fastq, kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.fna, kmerCon/split/pooled_1_0_1cand1.fastq, clust/members/pooled_2_0_1.fastq, kmerCon/polish/pooled_1_0_2cand1/minimap2/raw.fna, kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.fna, kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.fna, clust/members/pooled_1_0_2.fastq, kmerCon/split/pooled_0_0_1cand1.fastq, kmerCon/split/pooled_2_0_1cand1.fastq
    resources: tmpdir=/tmp
DAG of jobs will be updated after completion.

Config file config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Thu Feb  1 10:39:24 2024]
Finished job 5.
35 of 50 steps (70%) done
Creating conda environment ../tmp/repo/laca/workflow/envs/racon.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/racon.yaml created (location: tmp_db/conda_envs/d9e9e3d72b467b116a163d017358d209_)
Creating conda environment ../tmp/repo/laca/workflow/envs/medaka.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/medaka.yaml created (location: tmp_db/conda_envs/bde525cfaffd6da24da878a8a66aa8f3_)
Creating conda environment ../tmp/repo/laca/workflow/envs/minimap2.yaml...
Downloading and installing remote packages.
Environment for /tmp/repo/laca/workflow/rules/../envs/minimap2.yaml created (location: tmp_db/conda_envs/51f154299f902025c37a9ddea36e9595_)
Removing temporary output clust/members/pooled_1_0_2.fastq.
Removing temporary output clust/members/pooled_2_0_1.fastq.
Removing temporary output clust/members/pooled_1_0_1.fastq.
Removing temporary output clust/members/pooled_0_0_1.fastq.
Select jobs to execute...

[Thu Feb  1 10:54:03 2024]
rule minimap2polish:
    input: kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.fna, kmerCon/split/pooled_0_0_1cand1.fastq
    output: kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.paf
    log: logs/kmerCon/pooled_0_0_1cand1/minimap2_raw.log
    jobid: 95
    benchmark: benchmarks/kmerCon/pooled_0_0_1cand1/minimap2_raw.txt
    reason: Missing output files: kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_0_0_1cand1, assembly=raw
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

[Thu Feb  1 10:54:03 2024]
rule minimap2polish:
    input: kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.fna, kmerCon/split/pooled_2_0_1cand1.fastq
    output: kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.paf
    log: logs/kmerCon/pooled_2_0_1cand1/minimap2_raw.log
    jobid: 105
    benchmark: benchmarks/kmerCon/pooled_2_0_1cand1/minimap2_raw.txt
    reason: Missing output files: kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_2_0_1cand1, assembly=raw
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/51f154299f902025c37a9ddea36e9595_

Activating conda environment: tmp_db/conda_envs/51f154299f902025c37a9ddea36e9595_
[Thu Feb  1 10:54:03 2024]
rule minimap2polish:
    input: kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.fna, kmerCon/split/pooled_1_0_1cand1.fastq
    output: kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.paf
    log: logs/kmerCon/pooled_1_0_1cand1/minimap2_raw.log
    jobid: 100
    benchmark: benchmarks/kmerCon/pooled_1_0_1cand1/minimap2_raw.txt
    reason: Missing output files: kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_1_0_1cand1, assembly=raw
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/51f154299f902025c37a9ddea36e9595_
[Thu Feb  1 10:54:04 2024]
Finished job 105.
36 of 70 steps (51%) done
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule racon:
    input: kmerCon/split/pooled_2_0_1cand1.fastq, kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.paf, kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.fna
    output: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.fna
    log: logs/kmerCon/pooled_2_0_1cand1/racon_1.log
    jobid: 104
    benchmark: benchmarks/kmerCon/pooled_2_0_1cand1/racon_1.txt
    reason: Missing output files: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.fna; Input files updated by another job: kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_2_0_1cand1, iter=1
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/d9e9e3d72b467b116a163d017358d209_
[Thu Feb  1 10:54:04 2024]
Finished job 100.
37 of 70 steps (53%) done
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule racon:
    input: kmerCon/split/pooled_1_0_1cand1.fastq, kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.paf, kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.fna
    output: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.fna
    log: logs/kmerCon/pooled_1_0_1cand1/racon_1.log
    jobid: 99
    benchmark: benchmarks/kmerCon/pooled_1_0_1cand1/racon_1.txt
    reason: Missing output files: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.fna; Input files updated by another job: kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_1_0_1cand1, iter=1
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/d9e9e3d72b467b116a163d017358d209_
[Thu Feb  1 10:54:04 2024]
Finished job 95.
38 of 70 steps (54%) done
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule racon:
    input: kmerCon/split/pooled_0_0_1cand1.fastq, kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.paf, kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.fna
    output: kmerCon/polish/pooled_0_0_1cand1/minimap2/racon_1.fna
    log: logs/kmerCon/pooled_0_0_1cand1/racon_1.log
    jobid: 94
    benchmark: benchmarks/kmerCon/pooled_0_0_1cand1/racon_1.txt
    reason: Missing output files: kmerCon/polish/pooled_0_0_1cand1/minimap2/racon_1.fna; Input files updated by another job: kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_0_0_1cand1, iter=1
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/d9e9e3d72b467b116a163d017358d209_
[Thu Feb  1 10:54:04 2024]
Finished job 104.
39 of 70 steps (56%) done
Removing temporary output kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.fna.
Removing temporary output kmerCon/polish/pooled_2_0_1cand1/minimap2/raw.paf.
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule minimap2polish:
    input: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.fna, kmerCon/split/pooled_2_0_1cand1.fastq
    output: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.paf
    log: logs/kmerCon/pooled_2_0_1cand1/minimap2_racon_1.log
    jobid: 103
    benchmark: benchmarks/kmerCon/pooled_2_0_1cand1/minimap2_racon_1.txt
    reason: Missing output files: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.paf; Input files updated by another job: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.fna
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_2_0_1cand1, assembly=racon_1
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/51f154299f902025c37a9ddea36e9595_
[Thu Feb  1 10:54:04 2024]
Finished job 103.
40 of 70 steps (57%) done
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule racon:
    input: kmerCon/split/pooled_2_0_1cand1.fastq, kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.paf, kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.fna
    output: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna
    log: logs/kmerCon/pooled_2_0_1cand1/racon_2.log
    jobid: 102
    benchmark: benchmarks/kmerCon/pooled_2_0_1cand1/racon_2.txt
    reason: Missing output files: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna; Input files updated by another job: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.fna, kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_2_0_1cand1, iter=2
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/d9e9e3d72b467b116a163d017358d209_
[Thu Feb  1 10:54:04 2024]
Finished job 99.
41 of 70 steps (59%) done
Removing temporary output kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.fna.
Removing temporary output kmerCon/polish/pooled_1_0_1cand1/minimap2/raw.paf.
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule minimap2polish:
    input: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.fna, kmerCon/split/pooled_1_0_1cand1.fastq
    output: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.paf
    log: logs/kmerCon/pooled_1_0_1cand1/minimap2_racon_1.log
    jobid: 98
    benchmark: benchmarks/kmerCon/pooled_1_0_1cand1/minimap2_racon_1.txt
    reason: Missing output files: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.paf; Input files updated by another job: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.fna
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_1_0_1cand1, assembly=racon_1
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/51f154299f902025c37a9ddea36e9595_
[Thu Feb  1 10:54:04 2024]
Finished job 102.
42 of 70 steps (60%) done
Removing temporary output kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.paf.
Removing temporary output kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_1.fna.
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule medaka_consensus:
    input: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna, kmerCon/split/pooled_2_0_1cand1.fastq
    output: kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus.fasta, kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus.fasta.gaps_in_draft_coords.bed, kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus_probs.hdf, kmerCon/polish/pooled_2_0_1cand1/medaka_1/calls_to_draft.bam, kmerCon/polish/pooled_2_0_1cand1/medaka_1/calls_to_draft.bam.bai
    log: logs/kmerCon/pooled_2_0_1cand1/medaka_1.log
    jobid: 101
    benchmark: benchmarks/kmerCon/pooled_2_0_1cand1/medaka_1.txt
    reason: Missing output files: kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus.fasta; Input files updated by another job: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_2_0_1cand1, iter2=1
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/bde525cfaffd6da24da878a8a66aa8f3_
[Thu Feb  1 10:54:04 2024]
Finished job 98.
43 of 70 steps (61%) done
Select jobs to execute...

[Thu Feb  1 10:54:04 2024]
rule racon:
    input: kmerCon/split/pooled_1_0_1cand1.fastq, kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.paf, kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.fna
    output: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_2.fna
    log: logs/kmerCon/pooled_1_0_1cand1/racon_2.log
    jobid: 97
    benchmark: benchmarks/kmerCon/pooled_1_0_1cand1/racon_2.txt
    reason: Missing output files: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_2.fna; Input files updated by another job: kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.fna, kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.paf
    wildcards: consensus=kmerCon, bc_cls_cand=pooled_1_0_1cand1, iter=2
    threads: 2
    resources: tmpdir=/tmp, mem=10, time=1

Activating conda environment: tmp_db/conda_envs/d9e9e3d72b467b116a163d017358d209_
/usr/bin/bash: line 7: LD_LIBRARY_PATH: unbound variable
[Thu Feb  1 10:54:04 2024]
Error in rule medaka_consensus:
    jobid: 101
    input: kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna, kmerCon/split/pooled_2_0_1cand1.fastq
    output: kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus.fasta, kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus.fasta.gaps_in_draft_coords.bed, kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus_probs.hdf, kmerCon/polish/pooled_2_0_1cand1/medaka_1/calls_to_draft.bam, kmerCon/polish/pooled_2_0_1cand1/medaka_1/calls_to_draft.bam.bai
    log: logs/kmerCon/pooled_2_0_1cand1/medaka_1.log (check log file(s) for error details)
    conda-env: /home/tmp_db/conda_envs/bde525cfaffd6da24da878a8a66aa8f3_
    shell:

        # if fna file is empty, make dummy output
        if [ ! -s kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna ]; then
            mkdir -p kmerCon/polish/pooled_2_0_1cand1/medaka_1 2> logs/kmerCon/pooled_2_0_1cand1/medaka_1.log
            touch kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus.fasta kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus.fasta.gaps_in_draft_coords.bed kmerCon/polish/pooled_2_0_1cand1/medaka_1/consensus_probs.hdf kmerCon/polish/pooled_2_0_1cand1/medaka_1/calls_to_draft.bam kmerCon/polish/pooled_2_0_1cand1/medaka_1/calls_to_draft.bam.bai 2>> logs/kmerCon/pooled_2_0_1cand1/medaka_1.log
        else
            export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
            export LD_LIBRARY_PATH="$CONDA_PREFIX/lib":${LD_LIBRARY_PATH}
            export TF_CPP_MIN_LOG_LEVEL='2'

            export CUDA_VISIBLE_DEVICES=""
            medaka_consensus -i kmerCon/split/pooled_2_0_1cand1.fastq -d kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna -o kmerCon/polish/pooled_2_0_1cand1/medaka_1 -t 2 -m r941_min_hac_g507 > logs/kmerCon/pooled_2_0_1cand1/medaka_1.log 2>&1
            rm -f kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna.fai kmerCon/polish/pooled_2_0_1cand1/minimap2/racon_2.fna.map-ont.mmi

            export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}
            unset OLD_LD_LIBRARY_PATH
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Thu Feb  1 10:54:05 2024]
Finished job 94.
44 of 70 steps (63%) done
Removing temporary output kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.fna.
Removing temporary output kmerCon/polish/pooled_0_0_1cand1/minimap2/raw.paf.
[Thu Feb  1 10:54:05 2024]
Finished job 97.
45 of 70 steps (64%) done
Removing temporary output kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.paf.
Removing temporary output kmerCon/polish/pooled_1_0_1cand1/minimap2/racon_1.fna.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-02-01T103338.686815.snakemake.log
2024-02-01 10:54:05,378 - root - CRITICAL - Command 'snakemake all --directory '/home' --snakefile '/tmp/repo/laca/workflow/Snakefile' --configfile '/home/config.yaml' --use-conda --conda-prefix '/home/tmp_db/conda_envs' --use-singularity --singularity-prefix '/home/tmp_db/singularity_envs' --singularity-args '--bind /tmp/repo/laca/workflow/resources/guppy_barcoding/:/opt/ont/guppy/data/barcoding/,/home/tmp_fastq'  --rerun-triggers mtime --rerun-incomplete --scheduler greedy --jobs 6 --nolock   --resources mem=957 mem_mb=980158 java_mem=813      ' returned non-zero exit status 1. (laca.py:113)
yanhui09 commented 9 months ago

it's almost there. It's probably related to the version of medaka.

Could you please the medaka log file (logs/kmerCon/pooled_2_0_1cand1/medaka_1.log)?

AlessioMilanese commented 9 months ago

The file logs/kmerCon/pooled_2_0_1cand1/medaka_1.log is empty.

If I try to log into the docker and check the medaka version, but it's not visible:

alessiomilanese:tmp$ docker run -it -v `pwd`:/home --privileged yanhui09/laca
(base) root@09c5be3d3b8e:/home# medaka
bash: medaka: command not found
(base) root@09c5be3d3b8e:/home# medaka_consensus
bash: medaka_consensus: command not found

But maybe it is in an environment?

yanhui09 commented 9 months ago

yes. it's installed in the conda environment, i.e., tmp_db/conda_envs/d9e9e3d72b467b116a163d017358d209_ I found an error in the log.

/usr/bin/bash: line 7: LD_LIBRARY_PATH: unbound variable

I just tried to re-install/re-load medaka for laca in the base environment. it works fine. I will give another try for docker.

yanhui09 commented 9 months ago

Hi

I think it's related to the latest version of medaka for docker use. Fortunately, the LD_LIBRARY_PATH is for cuda use in medaka. Since medaka-cpu is usually fast enough to generate a consensus from a read cluster. I have suspended the cuda use of medaka in laca in the latest version. And it works on my server.

You can pull the latest docker image and give it try.

AlessioMilanese commented 9 months ago

Thanks for updating the docker image and for the prompt response and support. I re-run, and now it completed with no errors!

Here's what I have in the repository:

drwxr-xr-x 10 root            root   4.0K Feb  2 10:01 benchmarks
drwxr-xr-x  5 root            root   4.0K Feb  2 10:19 clust
-rw-r--r--  1 root            root   8.0K Feb  2 09:51 config.yaml
-rw-r--r--  1 root            root     52 Feb  2 10:02 count_matrix.tsv
drwxr-xr-x  5 root            root   4.0K Feb  2 09:57 demultiplexed
drwxr-xr-x  3 root            root   4.0K Feb  2 10:19 kmerCon
drwxr-xr-x  9 root            root   4.0K Feb  2 10:01 logs
drwxr-xr-x  3 root            root   4.0K Feb  2 10:19 qc
-rw-r--r--  1 root            root      0 Feb  2 09:57 .qc_DONE
drwxr-xr-x  2 root            root   4.0K Feb  2 10:02 quant
-rw-r--r--  1 root            root   4.2K Feb  2 10:01 rep_seqs.fasta
drwxr-xr-x  9 root            root   4.0K Feb  1 11:33 .snakemake
drwxr-xr-x  3 root            root   4.0K Feb  2 10:01 taxonomy
-rw-r--r--  1 root            root    525 Feb  2 10:19 taxonomy.tsv
drwxr-xr-x  5 root            root   4.0K Feb  2 10:02 tmp_db
drwxr-xr-x  2 alessiomilanese docker 4.0K Feb  2 09:49 tmp_fastq
drwxr-xr-x  3 root            root   4.0K Feb  2 10:19 tree
-rw-r--r--  1 root            root    118 Feb  2 10:19 tree.nwk

Just few more questions:

1) Is it correct to have 4 OTUs as results from you test data? 2) I assume that if I have more fastq files in my input folder (-b), then I will have more columns in the result file count_matrix.tsv? 3) In the count matrix I have as name for the sample BRK13, where was this specified? and how can I specify the names when I have as input multiple fastq files? 4) Where are the main results?

Checking the files I assume:

count matrix.tsv Contains the read counts for each OTU for the only sample we had

#OTU ID BRK13
OTU_1   29
OTU_2   790
OTU_3   298
OTU_4   74

taxonomy.tsv Contains the taxonomy annotation for each OTU

OTU_1   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas;s__Pseudomonas_aeruginosa
OTU_2   k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Listeriaceae;g__Listeria;s__Listeria_monocytogenes
OTU_3   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella_enterica
OTU_4   k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Limosilactobacillus;s__Lactobacillus_fermentum

rep_seqs.fasta Contains the actual sequences of the OTUs

>OTU_1
ATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTAAGTTAATACCTTGCTGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCAGCAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTACTGAGCTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTAGCCGTTGGGATCCTTGAGATCTTAGTGGCGCAGCTAACGCGATAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTGGCCTTGACATGCTGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTC
>OTU_2
GAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGTAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTTGACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGCGCGCAGGCGGTTTTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGAAGACTTGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCTTTGACCACTCTAGAGATAGAGCTTTCCCTTCGGGGACAAAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATTTTAGTTGCCAGCATTTAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGATAGTACAAAGGGTCGCGAAACCGCGAGGTGAAGCTAATCCCATAAAACTGTTCTCAGTTCGGATTGTAGGCTGCAACTCGCCTACATGAAGCCGGAATCGCTAGTAATCGTGGATCAGCATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTC
>OTU_3
AGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGTGTTGTGGTTAATAACCGCTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTCAAGTCGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAGGCTTGAGTCTTGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAAGTTTCCAGAGATGAGATTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTTCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACG
>OTU_4
AATCTTCCACAATGGGCGCAAGCCTGATGGAGCAACACCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAGCTCTGTTGTTAAAGAAGAACACGTATGAGAGTAACTGTTCATACGTTGACGGTATTTAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGAGAGTGCAGGCGGTTTTCTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGAAGTGCATCGGAAACTGGATAACTTGAGTGCAGAAGAGGGTAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTACCTGGTCTGCAACTGACGCTGAGACTCGAAAGCATGGGTAGCGAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGATGAGTGCTAGGTGTTGGAGGGTTTCCGCCCTTCAGTGCCGGAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCTACGCGAAGAACCTTACCAGGTCTTGACATCTTGCGCCAACCCTAGAGATAGGGCGTTTCCTTCGGGAACGCAATGACAGGTGGTGCATGGTCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTACTAGTTGCCAGCATTAAGTTGGGCACTCTAGTGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGACGACGTCAGATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGGTACAACGAGTCGCGAACTCGCGAGGGCAAGCAAATCTCTTAAAACCGTTCTCAGTTCGGACTGCAGGCTGCAACTCGCCTGCACGAAGTCGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTC

tree.nwk Contains the phylogenetic tree

(('OTU_1':0.055790368,'OTU_3':0.09398297)1.000:0.04706135,('OTU_2':0.035200803,'OTU_4':0.066716778):0.074327542)root;
yanhui09 commented 9 months ago
  1. Yes. I found 4 OTUs as well from the kmerCon mode.
  2. Sure. You will have more demultiplexed samples if you have more fastq input. The column names are inferred from the demultiplexed directory in the result directory.
    (base) [yanhui@yan01 laca_docker]$ ls demultiplexed
    BRK13  suspected  unclassified  barcoding_summary.txt  read_processor_log-2024-02-06_07-40-43.log
  3. If you have custom barcodes, you need to follow the instruction to prepare the index file for guppy or minibar. If you use the ku barcodes, you can use the default barcodes. The barcodes are inferred in the demultiplexing process on the fastq input.
  4. You are right. count_matrix.tsv, taxonomy.tsv, rep_seqs.fasta and tree.nwk are the most commonly used files for downstream analysis.
AlessioMilanese commented 8 months ago

Nice thanks!