Many fields in datasets have a combination of a fixed format with some non-varying characters (eg, the UUIDs have "-" characters) and a reduced range of characters (eg, hexadecimal). This patch detects single-length fields and uses modulo-encoding to pack them into fixed-with integers with all of the entropy. It reduces (for example) the storage required for hexadecimal IDs by a factor of two.
It's a first step, there is much more that can be done.
Many fields in datasets have a combination of a fixed format with some non-varying characters (eg, the UUIDs have "-" characters) and a reduced range of characters (eg, hexadecimal). This patch detects single-length fields and uses modulo-encoding to pack them into fixed-with integers with all of the entropy. It reduces (for example) the storage required for hexadecimal IDs by a factor of two.
It's a first step, there is much more that can be done.