refresh-bio / vclust

Fast and accurate tool for calculating Average Nucleotide Identity (ANI) and clustering virus genomes and metagenomes
GNU General Public License v3.0
45 stars 1 forks source link

error when running a large dataset #9

Closed lingyi-owl closed 2 months ago

lingyi-owl commented 2 months ago

Hi,

When running the filter function on my dataset I got a ambiguous error:

2024-07-11 15:43:50,355 [INFO] Running: vclust-1.0.1_x64-linux/bin/kmer-db build -multisample-fasta -k 25 -t 94 results/BU_D1_MACV/vclust/vclust-8572ac6e608742eb/whole.txt results/BU_D1_MACV/vclust/vclust-8572ac6e608742eb/whole.kdb
Kmer-db version 2.0.3 (28.06.2024)
S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Analysis started at Thu Jul 11 15:43:50 2024

Building database (from fasta genomes)
Processing samples...
1087703/1087703

EXECUTION TIMES
Total: 183.351
Kmer sorting/unique time: 17.7102
Database update time:157.981
        Hashtable processing (parallel): 114.242
                Resize: 0.570049
                Find'n'add: 1.13306
        Sort time (parallel): 3.46597
        Pattern extension time (parallel): 39.1639

STATISTICS
Number of samples: 1,087,703
Number of patterns: 2,179,536 (109,067,584 B)
Number of k-mers: 661,884,760
K-mer length: 25
Minhash fraction: 1
Workers count: 94

Serializing database...
Storing k-mer hashtables (raw)...
262144/262144 hashtables stored in 17.1578 s
Storing patterns...
2179536/2179536 patterns stored in 0.0984113 s

Releasing memory...OK (0.186328 seconds)

Analysis finished at Thu Jul 11 15:47:12 2024

2024-07-11 15:47:14,288 [INFO] Done
2024-07-11 15:47:14,290 [INFO] Running: vclust-1.0.1_x64-linux/bin/kmer-db all2all -sparse -above 10 -t 94 results/BU_D1_MACV/vclust/vclust-8572ac6e608742eb/whole.kdb results/BU_D1_MACV/vclust/vclust-8572ac6e608742eb/all2all.txt
Kmer-db version 2.0.3 (28.06.2024)
S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Analysis started at Thu Jul 11 15:47:14 2024

All versus all comparison
Loading k-mer database results/BU_D1_MACV/vclust/vclust-8572ac6e608742eb/whole.kdb...
Loading patterns...
2179536/2179536 patterns loaded in 0.213622 s
Calculating matrix of common k-mers...
2024-07-11 15:54:13,618 [ERROR] While running: /vclust-1.0.1_x64-linux/bin/kmer-db all2all -sparse -above 10 -t 94 results/BU_D1_MACV/vclust/vclust-8572ac6e608742eb/whole.kdb results/BU_D1_MACV/vclust/vclust-8572ac6e608742eb/all2all.txt
2024-07-11 15:54:13,620 [ERROR] Error message: None
aziele commented 2 months ago

Hi,

Thanks for reporting this ambiguous error. We've just fixed it in the latest Vclust release v1.0.3.

I'm sorry for the trouble, Andrzej