ubjson / universal-binary-json

Community workspace for the Universal Binary JSON Specification.
115 stars 12 forks source link

Change the order of COUNT (#) and TYPE ($) markers in Optimized Container format #49

Closed ghost closed 10 years ago

ghost commented 10 years ago

This was brought up by @edgar-bonet during the discussion of #43 here: https://github.com/thebuzzmedia/universal-binary-json/issues/43#issuecomment-48535053

Breaking this concern out of that monster thread to give folks a chance to comment on it.

There are arguments for (linked above) and against (https://github.com/thebuzzmedia/universal-binary-json/issues/43#issuecomment-48849300)

Just for fluidity of documentation, I'm OK with making this change because the only allowed combinations are:

  1. (NO ARGS)
  2. COUNT
  3. COUNT-TYPE

so in the sense of progressively more complete params in the docs, it looks good to me, but want to take into consideration the more technical issues with making this change.

I am NOT that worried about backward compatibility at this point in time; there will come a time when I am (1.0), but not now.

kxepal commented 10 years ago

Why do we need in both special markers # and $? Why not have only one and just reuse TLV format order instead to define count and type?

meisme commented 10 years ago

Since we allow count to appear by itself, we need to distinguish the two cases.

I am opposed to this change, since the type optimised is so different from the other cases. I feel it's easiest to distinguish them right away. I also feel that ease of implementation is more important than flow of documentation.

kxepal commented 10 years ago

@meisme see #50 for the more nicer solution: you always reserve "space" for count, but if it doesn't need we'll use null (Z) instead of integer value. And no need to guess during parsing will count ever happens or not.

ghost commented 10 years ago

I am open to @kxepal idea if others would desire to see the change.

The suggestion of having a single marker was proposed a few months ago (I'm sorry, I forgot by who).

meisme commented 10 years ago

I like the general idea but [#][Z][Z] seems like a waste of space. UBJSON parsing will be riddled with branches, so that case might as well be omitted.

kxepal commented 10 years ago

[#][Z][Z] actually means nothing. In #50 I defined it as checkpoint which you may use in your app logic, but without need to modify base process of working with data. Say, on each [#][Z][Z] you promise to made fsync() of received data or something similar.

ghost commented 10 years ago

WONTFIX

In additional discussions with others I was reminded why I defined the $-# markers in their current order -- to maintain the TLV ordering (Type($), Length(#), Value(payload))

Also given @meisme's example of being able to peek at the 2nd character after reading the '[' or '{' to know what kind of container is coming is helpful (as opposed to continuing to parse up to 5 more bytes before yet knowing what type of container it is.