tlemane / kmtricks

modular k-mer count matrix and Bloom filter construction for large read collections
GNU Affero General Public License v3.0
76 stars 7 forks source link

terminate called recursively / after throwing an instance of 'std::runtime_error' #16

Closed cumbof closed 2 years ago

cumbof commented 2 years ago

Hello, I'm struggling with the following problem:

I'm running kmtricks (installed through conda) in order to index 1051 genomes. This is the command line that produced the error:

kmtricks pipeline --file ./genomes.fof --run-dir ./index --kmer-size 31 --mode hash:bft:bin --hard-min 2 --soft-min 3 --share-min 1 --bloom-size 10000 --bf-format howdesbt --cpr

And this is the error message (the backtrace log file is empty):

[2022-04-28 12:54:22.153] [info] Run with Kmer<32> - uint64_t implementation
[2022-04-28 12:54:22.295] [info] Compute configuration...
[2022-04-28 12:54:22.295] [info] 1051 samples found (1051 read files).
[2022-04-28 12:55:19.370] [info] Use 4 partitions.
[2022-04-28 12:55:19.459] [info] Compute minimizer repartition...
Compute SuperK   [==================================================] [01m:40s]
Count partitions [==================================================] [01m:41s]
Merge partitions [>                                                 ] [00:00s]
terminate called recursively
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
terminate called recursively
  what():  Unable to open ./index/counts/partition_3/G268.hash.p4
[2022-04-28 13:06:08.228] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-28 13:06:08.228] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-28 13:06:08.229] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
terminate called recursively
[2022-04-28 13:06:08.229] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

I'm not sure if this is the same problem reported in issue #15, this is why I'm opening this new issue. In case this is the same kind of problem, is there something else I could try to overcome this issue without increasing the the maximum number of open files (ulimit)?

In #15 @tlemane suggested to also reduce the number of threads, but I didn't specify the -t argument, so this is not really useful in my case.

Thanks in advance for your help

tlemane commented 2 years ago

Hello,

I think you are facing the same issue. By default, the number of threads is the maximum available so you can try -t 1.

However, it seems that you use the k-mer rescue (--soft-min 3 --share-min 1) which is designed for error handling in the context of indexing reads. Since your inputs are genomes, maybe you do not need this feature. In index mode, the only purpose of the merge is the k-mer rescue so you can probably skip it:

kmtricks pipeline --file ./genomes.fof --run-dir ./index --kmer-size 31 --mode hash:bft:bin --hard-min 2 --bloom-size 10000 --bf-format howdesbt --cpr --skip-merge

Téo

tlemane commented 2 years ago

To be sure I did a test with about 1000 genomes using your command. I confirm that it doesn't work with a limit of 1024 (the default on most desktop systems). On my side, -t 1 and --skip-merge both work.

Téo

cumbof commented 2 years ago

Awesome, thanks for you help @tlemane!

tlemane commented 2 years ago

I missed something else. Still because you have genomes, I think you should use --hard-min 1 to index all k-mers.

cumbof commented 2 years ago

It makes sense. Thanks!