integer division might be one byte too small in some cases

frederic-mahe commented 1 year ago

In this call to bloomflex_init():

https://github.com/torognes/swarm/blob/e47c80ef5a68f93750a0eab0b48c0764ed0a189b/src/algod1.cc#L1335

The first parameter (m / sizeof(uint64_t)) is of type uint64_t and represents a number of bytes.

m is a number of bits, and then must by divided by 8 to get a number of bytes. However, m is:

uint64_t m = bits * microvariants * nucleotides_in_small_otus;

bits can take any value between 2 and 64; microvariants is a constant value of 7; nucleotides_in_small_otus can take any value starting from 1. Therefore, m can be any value, starting from 2 x 7 = 14.

Since m will be used by bloomflex_init() to allocate m / sizeof(uint64_t) bytes, then isn't that an issue to use an integer division?

For instance, If m is 31 bits, then a 3-byte bitmap (31 / 8) will be allocated, where in fact 4 bytes would be necessary to hold the 31bits.

A possible fix:

constexpr auto n_bits_in_a_byte = 8;
uint64_t n_bytes = m / n_bits_in_a_byte;
if ((n_bytes % n_bits_in_a_byte) != 0) {
    ++n_bytes;
}

or in C++14:

#include <cassert>
#include <limits>

constexpr auto divide_into_bytes(unsigned long n_bits) -> unsigned long {
    constexpr auto n_bits_in_a_byte = 8;
    assert(n_bits <= std::numeric_limits<unsigned long>::max() - (n_bits_in_a_byte - 1));
    return ((n_bits + n_bits_in_a_byte - 1) / n_bits_in_a_byte);
}

static_assert(divide_into_bytes(0) == 0);
static_assert(divide_into_bytes(1) == 1);
static_assert(divide_into_bytes(7) == 1);
static_assert(divide_into_bytes(8) == 1);
static_assert(divide_into_bytes(9) == 2);
static_assert(divide_into_bytes(16) == 2);
static_assert(divide_into_bytes(17) == 3);
static_assert(divide_into_bytes(8 * 10) == 10);
static_assert(divide_into_bytes(8 * 10 + 1) == 11);
static_assert(divide_into_bytes(std::numeric_limits<unsigned long>::max() - 7) == 2'305'843'009'213'693'951);
// 2'305'843'009'213'693'951 = 2^61 - 1

frederic-mahe commented 1 year ago

For the fun of it, here is another version that is valid for the whole range of uint64 values (for the price of a branch, but performance is not paramount here):

#include <limits>

constexpr auto divide_into_bytes(unsigned long n_bits) -> unsigned long {
    constexpr auto n_bits_in_a_byte = 8;
    if (n_bits == 0) { return 0; }
    return ((n_bits - 1) / n_bits_in_a_byte) + 1;
}

static_assert(divide_into_bytes(0) == 0, "Error");
static_assert(divide_into_bytes(1) == 1, "Error");
static_assert(divide_into_bytes(7) == 1, "Error");
static_assert(divide_into_bytes(8) == 1, "Error");
static_assert(divide_into_bytes(9) == 2, "Error");
static_assert(divide_into_bytes(16) == 2, "Error");
static_assert(divide_into_bytes(17) == 3, "Error");
static_assert(divide_into_bytes(8 * 10) == 10, "Error");
static_assert(divide_into_bytes(8 * 10 + 1) == 11, "Error");
static_assert(divide_into_bytes(std::numeric_limits<unsigned long>::max()) == 2'305'843'009'213'693'952, "Error");
// 2305843009213693952 = 2^64 / 2^3 = 2^61

torognes commented 1 year ago

Or even simpler:

constexpr auto divide_into_bytes(unsigned long n_bits) -> unsigned long {
    constexpr auto n_bits_in_a_byte = 8;
    return (n_bits + n_bits_in_a_byte - 1) / n_bits_in_a_byte;
}

frederic-mahe commented 1 year ago

yes, but it overflows when n_bits is greater than ULONGMAX - 7, hence the assert(n_bits <= std::numeric_limits<unsigned long>::max() - (n_bits_in_a_byte - 1)); in my first version.

torognes commented 1 year ago

Yes, you're right.

torognes commented 1 year ago

I'll try to fix it today.

frederic-mahe commented 1 year ago

I am still not sure we need a function, but if we do, we can turn it into a template and use it with different denominators:

#include <limits>

// correct for all values of numerator (0 to max()) 
// correct for all values of Denominator (1 to max()) 
// todo: use concepts to generalize for any unsigned int 

template<unsigned long Denominator = 8>
constexpr unsigned long divide_int_round_up(unsigned long numerator) {
    if (numerator == 0) { return 0; }
    return ((numerator - 1) / Denominator) + 1;
}

static_assert(divide_int_round_up(0) == 0, "Error");
static_assert(divide_int_round_up(16 - 1) == 2, "Error");
static_assert(divide_int_round_up(16) == 2, "Error");
static_assert(divide_int_round_up(16 + 1) == 3, "Error");
static_assert(divide_int_round_up<8>(0) == 0, "Error");
static_assert(divide_int_round_up<8>(0) == 0, "Error");
static_assert(divide_int_round_up<8>(0) == 0, "Error");
static_assert(divide_int_round_up<8>(8 * 10) == 10, "Error");
static_assert(divide_int_round_up<8>(std::numeric_limits<unsigned long>::max()) == 2'305'843'009'213'693'952, "Error");
static_assert(divide_int_round_up<1>(std::numeric_limits<unsigned long>::max()) == std::numeric_limits<unsigned long>::max(), "Error");
static_assert(divide_int_round_up<64>(0) == 0, "Error");
static_assert(divide_int_round_up<64>(64 * 10) == 10, "Error");
static_assert(divide_int_round_up<64>(64 * 10 + 1) == 11, "Error");
static_assert(divide_int_round_up<64>(64 * 10 - 1) == 10, "Error");

Here is a C++11 version:

template<unsigned long Denominator = 8>
unsigned long divide_int_round_up(unsigned long numerator) {
    static_assert(Denominator != 0, "Denominator can't be null");
    if (numerator == 0) { return 0; }
    return ((numerator - 1) / Denominator) + 1;
}

that can be tested on compiler explorer+%2B+1%3B%0A%7D%0A%0Aint+main+()+%7B%0A++++return+divide_int_round_up%3C64%3E(16)%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:g122,deviceViewOpen:'1',filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O3+-std%3Dc%2B%2B11+-Wall+-Wextra+-g+-Wshadow+-Wnon-virtual-dtor+-Wold-style-cast+-Wcast-align+-Wunused+-Woverloaded-virtual+-Wpedantic+-Wconversion+-Wsign-conversion+-Wmisleading-indentation+-Wduplicated-cond+-Wduplicated-branches+-Wlogical-op+-Wnull-dereference+-Wuseless-cast+-Wdouble-promotion+-Wformat%3D2',selection:(endColumn:12,endLineNumber:3,positionColumn:12,positionLineNumber:3,selectionStartColumn:12,selectionStartLineNumber:3,startColumn:12,startLineNumber:3),source:1),l:'5',n:'0',o:'+x86-64+gcc+12.2+(Editor+%231)',t:'0')),k:33.333333333333336,l:'4',m:36.809815950920246,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compilerName:'x86-64+gcc+12.2',editorid:1,fontScale:14,fontUsePx:'0',j:1,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+gcc+12.2+(Compiler+%231)',t:'0')),header:(),l:'4',m:38.190184049079754,n:'0',o:'',s:0,t:'0'),(g:!((h:executor,i:(argsPanelShown:'1',compilationPanelShown:'0',compiler:g122,compilerOutShown:'0',execArgs:'',execStdin:'',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'',source:1,stdinPanelShown:'1',tree:'1',wrap:'1'),l:'5',n:'0',o:'Executor+x86-64+gcc+12.2+(C%2B%2B,+Editor+%231)',t:'0')),header:(),l:'4',m:25,n:'0',o:'',s:0,t:'0')),k:50,l:'3',n:'0',o:'',t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4)

torognes commented 1 year ago

Fixed in dev now.

torognes commented 1 year ago

The exact size of the Bloom filter is not important as long as we do not access memory outside it. The Bloom filter is just to make the program a bit faster by using some extra memory. If the filter is a bit smaller than intended it has only a negligible effect on the speed. The logic and the results are not affected.

frederic-mahe commented 1 year ago

Thanks @torognes ! I'll launch the tests tonight. Once they are done, we can make a new release.

Here is a draft changelog for release v3.1.3:

fix: bug introduced in version 3.1.1, that caused swarm to allocate way too much memory when d > 1 (bug had no impact on clustering results),
fix: off-by-one error when allocating memory for a Bloom filter (bug had no impact on clustering results),
fix: compilation warning with GCC 12 (and more recent) when using link-time optimization,
fix: compilation warning with clang 13 (and more recent): unused set variable,
fix: five clang-tidy warnings (readability-braces-around-statements),
fix: minor code refactoring,
improve: more uniform vocabulary throughout swarm's documentation (code, help, manpage, README, companion scripts and wiki),
improve: code coverage of our test suite (swarm-tests) WIP

(please feel free to modify it as you see fit)

torognes commented 1 year ago

Sounds good. I hope the tests run fine!

frederic-mahe commented 1 year ago

hello @torognes !

Test results are good, memory usage is back to normal. I've added a series of tests to check that swarm's results stay strictly identical (same hash value) from one release to another, and when switching compiler. I was already doing something similar, but now it is automatic and systematic.

Please merge from dev and make a new release, at your convenience. Here is a draft changelog for release v3.1.3:

fix: bug introduced in version 3.1.1, that caused swarm to allocate way too much memory when d > 1 (bug had no impact on clustering results),
fix: off-by-one error when allocating memory for a Bloom filter (bug had no impact on clustering results),
fix: compilation warning with GCC 12 (and more recent) when using link-time optimization,
fix: compilation warning with clang 13 (and more recent): unused set variable,
fix: five clang-tidy warnings (readability-braces-around-statements),
fix: minor code refactoring,
improve: more uniform vocabulary throughout swarm's documentation (code, help, manpage, README, companion scripts and wiki),
improve: code coverage of our test suite (swarm-tests)

(please feel free to modify it as you see fit)

torognes commented 1 year ago

Ok, good! Version 3.1.3 has been released.

frederic-mahe commented 1 year ago

Thanks @torognes !

torognes commented 1 year ago

I have merged all changes into both master and dev.

torognes / swarm

integer division might be one byte too small in some cases #174