ubjson / universal-binary-json

Community workspace for the Universal Binary JSON Specification.
116 stars 12 forks source link

Add a BYTE type to spec #22

Closed ghost closed 11 years ago

ghost commented 11 years ago

Initiated by the discussion surrounding #21

Adding a [B]yte type to the spec that is effectively an unsigned byte value (uint8) for values 0-255.

This is NOT our ultimate solution to binary-data-in-UBJSON, this is done intentionally do the [C]HAR type is not abused to store small, unsigned int16 values.

Also, in conjunction with STC (strongly-typed-arrays, if we ever add it) this would be a very clean container for adding binary data to the spec.

Question

Does adding this special 1-off type break the consistency we have formed with the other numeric types -- for example, someone reading the spec might say "if B is Byte, what is int8?" or do we perceive the value of this 1-off type to be worth it?

AnyCPU commented 11 years ago

[C]har type or [A]scii type is one byte, so it's also int8 or uint8 on some systems. For simplicity in Ubjson it is int8. Byte (8 bits) itself does not have a sign. In general byte type together with STC is good for raw info, raw dumps.

ghost commented 11 years ago

+1 I think this 1-off type is worth it.

I would also suggest adding this [B]yte type (uint8) to the list of valid integral numeric types available for describing the length of a string.

Requiring the extra byte for string lengths between 127 and 256 seems a waste. (yes, it's less than 1% of payload but still a waste)

ghost commented 11 years ago

Note to Self: Most of the conversation around supporting these specific types is happening in #21

ghost commented 11 years ago

@syount Appreciate the feedback. I think we are all on board with adding C/B now to Draft 9 so I'll do that to the spec shortly: http://ubjson.org/type-reference/

ghost commented 11 years ago

Completed

Added to Draft 9: http://ubjson.org/type-reference/value-types/#numeric

NOTE: Changed the marker to "U" for "unsigned" -- the type name is "uint8", that made more sense because:

  1. there are not plans to add any more unsigned types
  2. this is the building block of binary support, so a very specific reason for it existing.
  3. "B" for byte was confusing; anyone reading the type list would have seen "byte" and "int8" next to each other and thought: "That doesn't make sense, they are both bytes..."