ubjson / universal-binary-json

Community workspace for the Universal Binary JSON Specification.
115 stars 12 forks source link

Redundant byte in strongly typed containers #45

Closed meisme closed 10 years ago

meisme commented 10 years ago

Currently strongly typed containers are defined with

[[][$][L][#][u][6]<data>

however the [#] marker is redundant here since the spec requires that it always follows a [$]. Instead the spec could just specify that [$] takes two arguments, so the new definition became

[[][$][L][u][6]<data>

Now obviously won't save a great deal of space, but since [$] already needs to be specially handled it won't introduce any new complexity either. I also think it gives a nice symmetry when type optimized containers are defined simply as [[][$]... and count optimized containers as [[][#]...

ghost commented 10 years ago

$ is actually optional which is why there is also a # marker - and in the case where both are specified, the # is still required to keep parsing simple.

Please re-open if you'd like to discuss further.

meisme commented 10 years ago

I know that $ is optional. The three container headers defined in draft 11 are

Normal:
[[]
    [L][0]
    [L][0]
[]]

"Count optimized":
[[][#][i][2]
    [L][0]
    [L][0]

"Type optimized":
[[][$][L][#][i][2]
    [0]
    [0]

I am only suggesting that the type optimized header be changed, leaving the count optimized as is. This would give us

New "type optimized":
[[][$][L][i][2]
    [0]
    [0]

As far as I can see this change shouldn't introduce any more complexity. My own implementation would actually be 2 lines shorter.

edit: Also I can't actually reopen the issue, don't have the permissions.

ghost commented 10 years ago

Just so I'm clear:

  1. Count-optimized, use the # marker only as currently defined.
  2. Type-and-Count-optimized, use the $ marker only and no # marker

Is that what you are proposing? Essentially a 1-byte optimization to the "Type and Count" scenario?

ghost commented 10 years ago

If I didn't get that right, please spell out what you are proposing in the Count-optimized and Type-and-Count-optimized cases with an example because I may be having a brain-fart and missing your point completely.

meisme commented 10 years ago

Yes, that is it. The entire range of examples would then be

Normal:
[[]
    [L][0]
    [L][0]
[]]

"Count optimized":
[[][#][i][2]
    [L][0]
    [L][0]

"Type optimized":
[[][$][L][i][2]
    [0]
    [0]

I agree that it's a very small optimization, but since we are discussing the possibility of nested strong typed containers the headers might change anyway. If they do, that change might as well include this small optimization.

kxepal commented 10 years ago

@meisme just increase array size from 2 to 100:

Normal: 2 + 2100 = 202 bytes Count-Optimized: 4 + 2 * 100 = 204 bytes Type Optimized: 5 + 1 \ 100 = 106 bytes

(;

meisme commented 10 years ago

@kxepal I'm afraid I don't quite see your point? Is it that percentage wise this would be a very small optimization? If so, then yes, I do realize this. I guess it's just the redundancy that bugs me.

ghost commented 10 years ago

@meisme Thank you for the clarification, the redundancy is intentional.

UBJSON has worked hard to follow the "path of least surprise" in its design as it has matured, this includes not short-handing notation like this that cause changes in behavior from what you would expect at the expense of a trivial amount of space.

For example, over email I am currently having a discussion about allowing the use of [C] markers for object name values and not just assuming it is a [S] marker followed by an integer size.

Another very minor change, but now parsers have to take an extra step and add an extra conditional statement to check "if(marker == C) parserChar(); else parseInt()"

We start getting down a slippery slope of "death by 1,000 paper cuts" or in this case, "complexity by 1,000 minor changes"

We have to draw the line somewhere in the sand and we drew it SLIGHTLY on the side of "at the cost of a few bytes of verbosity, make the spec as predictable as possible so anyone implementing it, could more or less guess at behavior and probably be right. Don't surprise anyone."

Where I have gone back on this is when there are significant savings at play potentially, then I will reconsider this, but for cases where the win is minimal, we historically have not ruled in favor of more special-case behavior.

meisme commented 10 years ago

Fair enough, I will consider the issue closed.

However I have to point out that this change would not introduce complexity ([$ and {$ would always be followed by two arguments, no extra conditional needed over what we have today), and it shouldn't be, as you put it, a surprise as the president of single markers being able to have two arguments have already been set by strings.

About the other issue you raised, being able to use [C] as well in object, I hope you won't make a change there without opening an issue and allowing discussion.

ghost commented 10 years ago

re: [C] issue, something like that would definitely go through community review.

ghost commented 9 years ago

@meisme for what it's worth, I think we are coming around on your proposal here in the form of #51