jk-jeon / dragonbox

Reference implementation of Dragonbox in C++
Apache License 2.0
588 stars 37 forks source link

Fastest way of trimming trailing zeros #62

Closed jk-jeon closed 4 months ago

jk-jeon commented 4 months ago

Currently we use the method proposed in the paper by Granlund-Montgomery: using modular inverse and std::rotr.

However, there are at least two competitors in this league. Assuming $q$-bit integers, and let $d$ be the divisor, then:

For future reference for myself, I write the code for the last method here:

int remove_trailing_zeros(std::uint32_t& n) noexcept {
    int s = 0;
    while (true) {
        auto const nm_mod = std::uint32_t(n * UINT32_C(42949673));
        if (nm_mod < UINT32_C(42949673)) {
            s += 2;
            n = std::uint32_t(nm_mod >> 2);
        }
        else {
            break;
        }
    }
    auto const nm_mod = std::uint32_t(n * UINT32_C(1288490189));
    if (nm_mod < UINT32_C(429496731)) {
        s += 1;
        n = std::uint32_t(nm_mod >> 1);
    }
    return s;
}

int remove_trailing_zeros(std::uint64_t& n) noexcept {
    int s = 0;
    while (true) {
        auto const nm_mod = std::uint64_t(n * UINT64_C(14941862699704736809));
        if (nm_mod < UINT64_C(184467440737095517)) {
            s += 2;
            n = std::uint64_t(nm_mod >> 2);
        }
        else {
            break;
        }
    }
    auto const nm_mod = std::uint64_t(n * UINT64_C(5534023222112865485));
    if (nm_mod < UINT64_C(1844674407370955163)) {
        s += 1;
        n = std::uint64_t(nm_mod >> 1);
    }
    return s;
}

According to a computation, the $32$-bit version is valid up to n = 1'073'741'899, and the $64$-bit version is valid up to n = 4'611'686'018'427'387'999. In Dragonbox, we need these routines for n's up to 16'777'215 for the $32$-bit version, and n's up to 9'007'199'254'740'991 for the $64$-bit version, so the input range is more than enough.

jk-jeon commented 4 months ago

Done as of ff9d375. See https://github.com/jk-jeon/rtz_benchmark.