ubjson / universal-binary-json

Community workspace for the Universal Binary JSON Specification.
115 stars 12 forks source link

Add digit literal 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 integral numeric types to spec #24

Closed ghost closed 11 years ago

ghost commented 11 years ago

Wanted to know what you guys thought...

Given the JSON Array: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

The current UBJSON is: [[] [i][0] [i][1] [i][2] [i][3] [i][4] [i][5] [i][6] [i][7] [i][8] [i][9] [i][10] []]

Whereas with digit literal types the UBJSON could be: [[] [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [i][10] []]

The space savings in this example are significant: 24 bytes vs 14 bytes

Thoughts?

kxepal commented 11 years ago

This is about STC containers from issue #13 . With his syntax this array may be represented as:

[<] [i]
    [0]
    [1]
    [2]
    [3]
    [4]
    [5]
    [6]
    [7]
    [8]
    [9]
    [10]
[>]

with same 14 bytes, but with special markers to split his processing from regular array.

ghost commented 11 years ago

@syount I like the idea, more optimal representation of commonly used values is absolutely along the right lines of thinking.

That said, @kxepal pointed to our current thinking along these lines (STC) which is a bit more flexible and gives us the same wins.

I am going to close this request (for that reason) but would be very interested in knowing what you thought of the discussion over on Issue #13 if you had the time.

The current status is that:

  1. We all like the idea.
  2. It opens up the door to use adding BINARY data support to UBJSON really cleanly.
  3. MY biggest hangup (and the reason we haven't added it 6 months ago) is that I currently perceive a lot of value in taking UBJSON, converting it to JSON, then back to UBJSON, and getting exactly the same payload information -- true 1:1 compatibility. If we add this feature, the STC construct will be translated into an ARRAY in JSON and then when converted back to UBJSON, the STC construct would be lost (without a lot of state maintained in the generator to try and optimize it back in which would require large read-ahead buffers).

I admit that it is entirely possible the importance I am putting around Point 3 is misplaced and my own invention, but I don't make the decision lightly. This is why we are still discussing it/sitting on it.

Would be nice to know what you thought over on #13 though if you had the time.

ghost commented 11 years ago

These digit literal representations are more versatile than the STC representation since they can provide their compact representation in a mixed type array.

breese commented 11 years ago

I agree that @syount's proposal is different from STC.

The digit literals are useful as the length for small strings. For instance, [s][0] is the empty string rather than [s][B][0], and [s][1][:] is a colon rather than [s][B][1][:].

This proposal gets my vote, even though it has been closed.

AnyCPU commented 11 years ago

A first example is STC. @breese your example is one char (ASCII) addition.

What if I want a string with 23.. 14... 18... lengths... it is ok only when it is fits into a one byte. Anyway... with high amount of literals we also loose speed and simplicity of analysis.

ghost commented 11 years ago

Just to clarify, I closed this not because it was a poor idea, but because it increases the complexity in generating (and especially parsing) UBJSON. We can absolutely consider this again in the future, I just want to get the high points of the spec done first before getting into micro-optimizations like this.