Open mgodwan opened 1 year ago
@nknize @reta Any thoughts on this? Based on the javadoc over the methods, and their usage in the existing cuckoo filter, the serialization and deserialization methods used here should be compatible, but seems like they're not.
I do see that we write bytes, and read long, which may possibly contribute to issues due to endianness expected while reading and writing.
@mgodwan this is weird, I haven't seen this class used anywhere, and I am not sure if it even works. I rewrote the same test using Apache Lucene PackedInts
(the XPackedInts seems to be cloned from?) and it seems to work as expected.
@Test
public void test() throws IOException {
PackedInts.Mutable data = PackedInts.getMutable(323, 32, PackedInts.COMPACT);
byte[] b = new byte[100000];
Writer writer = PackedInts.getWriterNoHeader(new ByteArrayDataOutput(b), Format.PACKED, data.size(), data.getBitsPerValue(), 1024);
for (int i = 0; i < 323; i ++) {
data.set(i, i);
writer.add(data.get(i));
}
writer.finish();
ReaderIterator data2 = PackedInts.getReaderIteratorNoHeader(new ByteArrayDataInput(b), Format.PACKED, PackedInts.VERSION_CURRENT, data.size(), data.getBitsPerValue(), 1024);
for (int i = 0; i < 323; i ++) {
long v = data2.next();
assert data.get(i) == v;
}
}
The only usage I see for this class seems to be within the CuckooFilter which in turn is used for, and can impact Rare Terms Aggregation use cases.
I was able to pass the test mentioned in issue description by wrapping the DataInput while reading through EndiannessReverserUtil.wrapDataInput(in)
. Looks like we may need something on these lines to fix this.
Looks like we may need something on these lines to fix this.
Why not to use Lucene's ones?
Why not to use Lucene's ones?
Yes, that should help solve this as it addresses the concerns. I can see that the Lucene class uses readBytes while reading due to which such an issue may not be present.
Describe the bug When using XPackedInts class and populating across nodes, the same structure yields different value on the 2 different nodes.
To Reproduce
Following UT can reproduce the issue:
Expected behavior The collection should give the same values after applygin serde operations. This does not seem to be happening with the snippet shared. On looking a little through the code, the issue can be related to endianness difference during serialization and deserialization.
Screenshots See
To Reproduce
Host/Environment (please complete the following information):
Additional context These classes are used in cuckoo filters, and can create problem there.