lemire / FastPFor

The FastPFOR C++ library: Fast integer compression
Apache License 2.0
873 stars 123 forks source link

Add support for 64-bit integers for FastPFor #55

Closed xndai closed 5 years ago

xndai commented 5 years ago
  1. Add a new template parameter for FastPFor encoder to specify the integer type. Curently uint32_t and uint64_t are supported.
  2. Add corresponding bit packing and unpacking functions for 64-bit integers.
  3. Overload IntegerCODEC interfaces to provide 64bit version of encodeArray() and decodeArray(). For codes that don't support 64bit, those would just throw not implemented exception.
  4. Add 64-bit support for CompositeCodec and VariableByte, so the encoding and decoding can work end to end with arbitary number of integers.
  5. Add gtest to the CMake project and create unittests for the new 64 bit support.
  6. Replace some of the assertions into exceptions.
lemire commented 5 years ago

Looks good to me. Merging.

pps83 commented 5 years ago

Overall, the change completely breaks windows build. Probably better to hard-revert before others pull the change. It doesn't even pass the tests that were added. You cannot even complete them in reasonable amount of time because it prints extreme amount of errors:

..\unittest\test_fastpfor.cpp(86): error: Expected equality of these values:
  in64[i]
    Which is: -2147483648
  out64[i]
    Which is: 2147483648
..\unittest\test_fastpfor.cpp(86): error: Expected equality of these values:
  in64[i]
    Which is: -2147483648
  out64[i]
    Which is: 2147483648
..\unittest\test_fastpfor.cpp(86): error: Expected equality of these values:

and at the end result looks like this: image

@xndai Can you please review your change and possibly fix the commit? I added a few comments, just wanted to do a quick review and it looked like this change cannot possibly work on windows, and sure enough after trying to run the tests I get bad results. Also, tests shouldn't spew endless errors, and perhaps they even don't print proper values (all of them seem to be 32-bit long in output). Some other parts are also questionable, not sure if the change even works on linux: for example, in fastpfor.h in __encodeArray there is this code: const uint32_t maxval = 1U << bestb; which cannot possibly work if IntType is uint64_t unless the code assumes that the range cannot exceed 32 bits (in which case I strongly insist that there is STILL a bug, because it must be commented to avoid possible confusion for anybody reviewing/debugging the code. On top of that, lots of changes seem to be style related, which should have been done as a separate commit to be able to see actual changes.

lemire commented 5 years ago

Ok. I will revert.

pps83 commented 5 years ago

@xndai you'll need these overloads at least:

__attribute__((const)) inline uint32_t gccbits(const int v) {
  return gccbits(static_cast<const uint32_t>(v));
}

__attribute__((const)) inline uint32_t gccbits(const uint64_t v) {
#ifdef _MSC_VER
  return 64 - __lzcnt64(v);
#else
  return v == 0 ? 0 : 64 - __builtin_clzll(v);
#endif
}

and all other 1UL vs 1ULL fixes (use template to deduce proper one, or make a const at class level?)

lemire commented 5 years ago

@xndai I have reverted the PR given @pps83 concerns. Can you address them and re-issue a PR?

lemire commented 5 years ago

I have opened an issue regarding CI tests under Windows... we need help setting it up... https://github.com/lemire/FastPFor/issues/57