shenwei356 / kmcp

Accurate metagenomic profiling && Fast large-scale sequence/genome searching
https://bioinf.shenwei.me/kmcp
MIT License
180 stars 13 forks source link

ETA missing when building KMCP index #36

Closed ericvdtoorn closed 1 year ago

ericvdtoorn commented 1 year ago

When building the KMCP index for Humgut (by following the instructions here, the ETA is stuck at 0s, even after several blocks have been completed.

kmcp index -j 32 -I humgut-k21-n10 -O humgut.kmcp -n 1 -f 0.3
13:31:35.908 [INFO] kmcp v0.9.3
13:31:35.909 [INFO]   https://github.com/shenwei356/kmcp
13:31:35.909 [INFO]
13:31:35.909 [INFO] loading .unik file infos from file: humgut-k21-n10/_info.txt
13:31:36.518 [INFO]   306910 cached file infos loaded
13:31:36.585 [INFO]
13:31:36.585 [INFO] -------------------- [main parameters] --------------------
13:31:36.585 [INFO]   number of hashes: 1
13:31:36.585 [INFO]   false positive rate: 0.300000
13:31:36.585 [INFO]   k-mer size(s): 21
13:31:36.585 [INFO]   split seqequence size: 0, overlap: 20
13:31:36.585 [INFO]   block-sizeX-kmers-t: 10.00 M
13:31:36.585 [INFO]   block-sizeX        : 256
13:31:36.585 [INFO]   block-size8-kmers-t: 20.00 M
13:31:36.585 [INFO]   block-size1-kmers-t: 200.00 M
13:31:36.585 [INFO] -------------------- [main parameters] --------------------
13:31:36.585 [INFO]
13:31:36.586 [INFO] building index ...
13:31:36.753 [INFO]
13:31:36.753 [INFO]   block size: 9592
13:31:36.753 [INFO]   number of index files: 32 (may be more)
13:31:36.753 [INFO]
13:31:36.753 [block #001] 1199 / 1199  100 %
13:31:36.753 [block #002] 1199 / 1199  100 %
13:31:36.754 [block #003] 1199 / 1199  100 %
13:32:30.922 [block #004] 1199 / 1199  100 %
13:32:34.941 [block #005] 1199 / 1199  100 %
13:33:33.902 [block #006] 1199 / 1199  100 %
13:33:40.757 [block #007] 1199 / 1199  100 %
13:34:45.743 [block #008] 1199 / 1199  100 %
13:34:54.006 [block #009] 1199 / 1199  100 %
13:35:59.125 [block #010] 1199 / 1199  100 %
13:36:08.695 [block #011] 1060 / 1199 [==========================>---]  88 %
13:37:15.240 [block #012]  847 / 1199 [====================>---------]  71 %
[saved index files]     10 / 32 [==========>-----------------------] ETA: 0s
ericvdtoorn commented 1 year ago

Of course, right after I post this, the ETA is suddenly defined. Guess that it only shows up after a sufficient number of files have been processed? (12 in my case)

shenwei356 commented 1 year ago

Oh, it's strange. It should be updated right after one index file been saved.

BTW, K-mer file processing and index writing are asynchronous. That means while block 11 and 12 being procesing, the index file of block 10 might not finished writing. One possible reason is the disk (NAS?) is to slow for writing big index files.

You can add --dry-run to check the size of each index file before really executing index buiding.

ericvdtoorn commented 1 year ago

Could be that writing the blocks just took that long (for the first block to finish writing and produce an ETA)?

shenwei356 commented 1 year ago

Yes, that's what I mean. The speed depends on the size of a index file and disk speed.