Closed KirillKryukov closed 4 years ago
It's probably wrapping around a data type as the -s param is val+7. At 9 this hits 16, which for 2 bits per base then equates to 2^32. It was intended this was the maximum, but maybe signed vs unsigned introduced a bug.
To be fair it doesn't normally get much better beyond -s7 or so and really needs a multi-length model (ppm style) and also to start using hash tables instead of direct array indexing to cope with the larger kmer.
I can fix the trivial issue though. Thanks for reporting it.
Now fixed. As expected it was a signed vs unsigned bug. Evidently I'd never checked -s9. My apologies.
@jkbonfield , thanks for the quick fix!
@jkbonfield , does -s9 work for you? I finally started testing it and found it to always crash ( #2 ).
I noticed that fqzcomp's
-s9
setting is inconsistent with the rest of the range.README.md:
Command line help:
It works as expected from
-s1
to-s8
. However, with-s9
it behaves differently: 1. It consumes tiny amount of RAM, smaller than-s1
. 2. It's very fast, again, faster than-s1
. 3. Its compression strength is weaker than-s1
.Essentially,
-s9
works as if it was-s0
. And the strongest compression is instead achieved by-s8
(for large enough inputs).Data for a 2.76 GB bacterial dataset (source):
The relationship is similar with other test data. Compression strength and speed depends on data, and is therefore less reliable indicator. However, allocated memory always shows similar relationship between
-s
settings. E.g., even on tiniest inputs fqzcomp will allocate 8 GB with-s8
. But with-s9
it consistently uses less RAM than with-s1
. Compression memory on a range of data sizes.This is not a critical issue, just something I did not expect (and that the manual does not explain).
OS is Ubuntu 18.04.1 LTS, total RAM is 128 GB. Let me know if you need any details, or a more rigorous standalone repro script.