Add digit literal 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 integral numeric types to spec

ghost commented 11 years ago

Wanted to know what you guys thought...

Given the JSON Array: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

The current UBJSON is: [[] [i][0] [i][1] [i][2] [i][3] [i][4] [i][5] [i][6] [i][7] [i][8] [i][9] [i][10] []]

Whereas with digit literal types the UBJSON could be: [[] [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [i][10] []]

The space savings in this example are significant: 24 bytes vs 14 bytes

Thoughts?

kxepal commented 11 years ago

This is about STC containers from issue #13 . With his syntax this array may be represented as:

[<] [i]
    [0]
    [1]
    [2]
    [3]
    [4]
    [5]
    [6]
    [7]
    [8]
    [9]
    [10]
[>]

with same 14 bytes, but with special markers to split his processing from regular array.

ghost commented 11 years ago

@syount I like the idea, more optimal representation of commonly used values is absolutely along the right lines of thinking.

That said, @kxepal pointed to our current thinking along these lines (STC) which is a bit more flexible and gives us the same wins.

I am going to close this request (for that reason) but would be very interested in knowing what you thought of the discussion over on Issue #13 if you had the time.

The current status is that:

We all like the idea.
It opens up the door to use adding BINARY data support to UBJSON really cleanly.
MY biggest hangup (and the reason we haven't added it 6 months ago) is that I currently perceive a lot of value in taking UBJSON, converting it to JSON, then back to UBJSON, and getting exactly the same payload information -- true 1:1 compatibility. If we add this feature, the STC construct will be translated into an ARRAY in JSON and then when converted back to UBJSON, the STC construct would be lost (without a lot of state maintained in the generator to try and optimize it back in which would require large read-ahead buffers).

I admit that it is entirely possible the importance I am putting around Point 3 is misplaced and my own invention, but I don't make the decision lightly. This is why we are still discussing it/sitting on it.

Would be nice to know what you thought over on #13 though if you had the time.

ghost commented 11 years ago

These digit literal representations are more versatile than the STC representation since they can provide their compact representation in a mixed type array.

breese commented 11 years ago

I agree that @syount's proposal is different from STC.

The digit literals are useful as the length for small strings. For instance, [s][0] is the empty string rather than [s][B][0], and [s][1][:] is a colon rather than [s][B][1][:].

This proposal gets my vote, even though it has been closed.

AnyCPU commented 11 years ago

A first example is STC. @breese your example is one char (ASCII) addition.

What if I want a string with 23.. 14... 18... lengths... it is ok only when it is fits into a one byte. Anyway... with high amount of literals we also loose speed and simplicity of analysis.

ghost commented 11 years ago

Just to clarify, I closed this not because it was a poor idea, but because it increases the complexity in generating (and especially parsing) UBJSON. We can absolutely consider this again in the future, I just want to get the high points of the spec done first before getting into micro-optimizations like this.

ubjson / universal-binary-json

Add digit literal 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 integral numeric types to spec #24