Before, row encoding and decoding would use the variable row encoding. Now, we use the fact that 0xFF is always an invalid UTF-8 character. To encode, the string with bytes b1, ..., bn becomes 0x02, b1 + 1, ..., bn + 1, 0x00. This way, we can just scan for the 0x00 when we want to know where to end. Empty strings are encoded as 0x01 and nulls as 0x00. Everything is bitwise inverted for descending.
This is always a size improvement and in particular saves massively for small strings. For example, encoding "a" went from 33 bytes to 3 bytes.
Before, row encoding and decoding would use the variable row encoding. Now, we use the fact that
0xFF
is always an invalid UTF-8 character. To encode, the string with bytesb1, ..., bn
becomes0x02, b1 + 1, ..., bn + 1, 0x00
. This way, we can just scan for the0x00
when we want to know where to end. Empty strings are encoded as0x01
and nulls as0x00
. Everything is bitwise inverted for descending.This is always a size improvement and in particular saves massively for small strings. For example, encoding "a" went from 33 bytes to 3 bytes.
This is a continuation of #19874.