zeromq / rfc

ZeroMQ RFC project
https://rfc.zeromq.org
109 stars 64 forks source link

Revise Z85 base encoding spec to handle non-multiples of 4 bytes #154

Open traversc opened 4 years ago

traversc commented 4 years ago

The spec doesn't allow for arbitrary number of bytes, but it could be easily modified to handle it.

In fact, there is enough information in just the size parameter to determine encoding and decoding of non-multiple of 4s, and you wouldn't need to change the encoding dictionary at all.

In my implementation, I handle non-multiple of 4s at the end of the loop as follows:

Encode side:

...
  size_t leftover_bytes = ... \\ 3, 2 or 1
  if(leftover_bytes == 1) {
    uint32_t value = data[dbyte];
    encoded[ebyte] = base85_encoder_ring[value / 85UL % 85];
    encoded[ebyte+1] = base85_encoder_ring[value % 85];
  } else if(leftover_bytes == 2) {
    uint32_t value = 256UL*data[dbyte] + data[dbyte+1];
    encoded[ebyte] = base85_encoder_ring[value / 7225UL];
    encoded[ebyte+1] = base85_encoder_ring[value / 85UL % 85];
    encoded[ebyte+2] = base85_encoder_ring[value % 85];
  } else if(leftover_bytes == 3) {
    uint32_t value = 65536UL*data[dbyte] + 256UL*data[dbyte+1] + data[dbyte+2];
    encoded[ebyte] = base85_encoder_ring[value / 614125UL % 85];
    encoded[ebyte+1] = base85_encoder_ring[value / 7225UL % 85];
    encoded[ebyte+2] = base85_encoder_ring[value / 85UL % 85];
    encoded[ebyte+3] = base85_encoder_ring[value % 85];
  }
  return encoded;

Decode side:

  size_t leftover_bytes = ... \\ 4, 3 or 2
  if(leftover_bytes == 2) {
    base85_check_byte(data[ebyte]);
    base85_check_byte(data[ebyte+1]);
    uint32_t value = 85UL*base85_decoder_ring[data[ebyte]-32] + base85_decoder_ring[data[ebyte+1]-32];
    if(value > 256) throw std::runtime_error("base85_decode: corrupted input data, decoded block overflow");
    decoded[dbyte] = value;
  } else if(leftover_bytes == 3) {
    base85_check_byte(data[ebyte]);
    base85_check_byte(data[ebyte+1]);
    base85_check_byte(data[ebyte+2]);
    uint32_t value = 7225UL*base85_decoder_ring[data[ebyte]-32] + 85UL*base85_decoder_ring[data[ebyte+1]-32];
    value         += base85_decoder_ring[data[ebyte+2]-32];
    if(value > 65536) throw std::runtime_error("base85_decode: corrupted input data, decoded block overflow");
    decoded[dbyte] = value / 256UL;
    decoded[dbyte+1] = value % 256;
  } else if(leftover_bytes == 4) {
    base85_check_byte(data[ebyte]);
    base85_check_byte(data[ebyte+1]);
    base85_check_byte(data[ebyte+2]);
    base85_check_byte(data[ebyte+3]);
    uint32_t value = 614125UL*base85_decoder_ring[data[ebyte]-32] + 7225UL*base85_decoder_ring[data[ebyte+1]-32];
    value         += 85UL*base85_decoder_ring[data[ebyte+2]-32] + base85_decoder_ring[data[ebyte+3]-32];
    if(value > 16777216) throw std::runtime_error("base85_decode: corrupted input data, decoded block overflow");
    decoded[dbyte] = value / 65536UL;
    decoded[dbyte+1] = value / 256UL % 256;
    decoded[dbyte+2] = value % 256;
  }
  return decoded;
bluca commented 4 years ago

Hi, RFC32 is stable so it cannot be changed. The process to update the spec is to fork the stable one in a new RFC number, mark it as draft, and make the changes there. And then update the implementations.

traversc commented 4 years ago

I could give it a shot and create a draft. What do you think of the suggested change itself?