Closed reiddraper closed 8 years ago
So, to dig a little deeper with that example:
37> sext:encode(TA).
379: encode_int(1000000000000, none)
432: Bl = [232,212,165,16,0]
434: Bb = <<232,212,165,16,0>>
<<11,244,117,52,177,8,0,8,0>>
38> sext:encode(TB).
379: encode_int(2000000000000, none)
432: Bl = [1,209,169,74,32,0]
434: Bb = <<1,209,169,74,32,0>>
<<11,128,244,117,52,169,4,0,8,0>>
TA is represented with 40 bytes, TB is represented with 41. This means that the first byte in the binary is smaller for TB, even though the binary is longer. Binaries in erlang seem to ignore length for the purposes of sorting, so as soon as it sees that 244 > 128, the comparison returns TA > TB.
One suggestion I had would be to store the number of bits needed to represent the number as the byte right after the tag byte, then your sort order would be a little better, although numbers with a bit size > 255 would be problematic, as would backwards compatability.
Try git pull https://github.com/uwiger/sext uw-bignum-encode
INCOMPATIBILITY: changed encoding of bignums
Bignums were encoded in a way that didn't preserve the sorting property.
To fix this, I have prepended a length indicator. In order to be
able to read legacy-encoded bignums, I made use of the fact that
the first byte returned in the return value from encode_big1/1
was never > 254. Thus, 255 is used to indicate the new format. I.e.
<<11, 255, ...length indicator..., ...encoded bignum...>>
The length indicator gives the number of bytes, and will usually
be one byte long. If the number of bytes is > 127, the length
indicator will be encoded as a sequence of 7-bit "septets", where
each except the last is tagged with 1 in the MSB.
The decode function removes the size indicator, and also recognizes
that if the first byte is =< 254, there is no size indicator,
and decodes a legacy bignum.
This means that a data set can be converted by simply decoding and
encoding each value.
For further backwards compatibility, the function
sext:legacy_encode_bignum/1 encodes a bignum in the old format.
Thanks, I can confirm this passes EQC + my new EQC changes at #6. Woot.
The change has been merged into master and labeled 1.0 (there are no more bugs now ;-)
2013/3/5 Reid Draper notifications@github.com
Thanks, I can confirm this passes EQC + my new EQC changes at #6https://github.com/uwiger/sext/issues/6. Woot.
— Reply to this email directly or view it on GitHubhttps://github.com/uwiger/sext/issues/5#issuecomment-14453672 .
Hi Ulf. Engel Sanchez from Basho here. I'm one of the people working on the effort to seamlessly fix sext encoded data on our next bug fix release. It seems we are going to need a way to encode an erlang term with big ints encoded using the legacy encoding. The function to do the legacy encoding on a standalone big int is not quite enough for what we need to do. I've read the sext code and understand enough to send you a PR, but in case you prefer to prepare one yourself, please do :). Otherwise, I'll be sending you a PR later today.
I'm going to sleep now, so feel free. It should be pretty straightforward.
Ulf Wiger http://ulf.wiger.net/
7 mar 2013 kl. 20:50 skrev Engel Sanchez notifications@github.com:
Hi Ulf. Engel Sanchez from Basho here. I'm one of the people working on the effort to seamlessly fix sext encoded data on our next bug fix release. It seems we are going to need a way to encode an erlang term with big ints encoded using the legacy encoding. The function to do the legacy encoding on a standalone big int is not quite enough for what we need to do. I've read the sext code and understand enough to send you a PR, but in case you prefer to prepare one yourself, please do :). Otherwise, I'll be sending you a PR later today.
— Reply to this email directly or view it on GitHub.
PR for changes #7
shouldn't it be closed?
Indeed. Thanks.
Riak uses
sext
for secondary indexes to create binary keys for leveldb. A user noticed that some integer 2i queries were not returning results when they were expected to. We seem to have narrowed the issue down tosext
. Here's an example, in an erlang shell:You can provoke this bug by using this patch to the EQC tests. h/t to @seancribbs and @vagabond for helping track this down.