DISCUSSION - Reconciling optimized headers, flexible STC, n-dimensional arrays and schemas

ghost commented 9 years ago

This needs to be a bigger discussion, so filing here for thoughts...

Big ideas in-flight...

48 - Allow STC headers to repeating inside a container to more closely match JSON container design.
51 - Redefine container headers to have a single, consistent and concise format (#)
43 detail in comment - Proposal to allow pre-defining of container 'structure' upfront - potentially massive space savings in highly optimized cases (currently a case UBJSON is not great at)
50 Alex's TypeSpec proposal; basically schemas. I think somewhat ties into the conversation around Point 3 above - basically allowing you to upfront define 'structure' of a container completely upfront - then the entire payload is just hard data.

My question

We are getting close to finalizing Point 1 above and allowing the headers (#51) to repeat during a container... that is great, but it doesn't address the potentially HUGE space savings discussed in Points 3 and 4 above and I wanted to make sure we weren't moving forward with a change that was either pointless (to be undone later) or conflicting with potentially addressing 3 and 4 later.

This issue is a forum for us to discuss where all these concerns are heading towards.

Steve132 commented 9 years ago

When we look at this we are actually trying to make tradeoffs on 4 variables:

[S]implicity (Is the format simple to understand for human programmers without lots of special cases)
[E]fficiency (Is the bytestream small when encoding common data)
[U]sability (Is the format/APIs easy to use for most use cases)
[P]erformance (Is it possible to implement a parser/encoder that runs with minimal runtime and memory costs)

I love UBJSON because honestly it's probably got the highest overall scores of any format I've seen in all 4 categories.

Here's my opinion overall.

I think that #43 (nesting) was +4S+0E+2U+0P

I think that #51 is -3S+0E+1U+0P

I think that #48 is -1S+1E-2U-6P

I think that #43 (with the Schemas) is -3S+1E-5U-1P

I think that #50 with the other schemas is -4S+1E-5U-1P

My OVERALL opinion is that UBJSON with nested STCs allowed (as the current status quo is) and a proposal for an ND-array marker in the header, is probably the best format anyone could possibly get. It provides an amazing balance between simplicity, mapping to json, implementation performance, small size and expressiveness.

I can't think of ANY common case data or use case that isn't INCREDIBLY optimized and well represented by that, and I view most of these other proposals as introducing a LOT of complexity for not a lot of gain. ( A few bytes here and there, usually by costing simplicity or implementation performance)

Remember that JSON can be fully specified in like, a 10 line diagram or a 2 paragraph description. BJSON has a single page of html, 90% of which is just the 10 types. XML was (initially) simple, and now it isn't (schemas, DTDs, XSLTs) all make it basically unusable.

The main reason I prefer UBJSON over BJSON is that they are comparably simple but UBJSON has contiguous-memory fixed-length types. That's the only advantage UBJSON has imho, but its a BIG one to me for performance and data. If UBJSON loses that advantage or becomes massively more complicated vs BJSON or BSON, then if I was recommending a data format to new programmers I wouldn't be able to advocate for UBJSON in good faith.

meisme commented 9 years ago

As far as I can tell, the only reason to allow multiple header definitions is to allow chunked transfers. Is there any compelling reason that chunked transfers should be supported "natively" in the spec, rather than being implemented using the basic tools that are available to you? Even in draft 11 you could easily do chunked transfers by splitting the data. like this

[[]
    [[][$][u][#][u][4][0][1][2][3]
    [[][$][u][#][u][4][4][5][6][7]
    ...
    [[][$][u][#][u][4][252][253][254][255]
[]]

Each of these sections could have a variable length, they could represent discrete units (like frames) or they could be concernated in the application (like snippets of an audio stream).

I think it's easy to fall into the trap of trying to provide too much built in functionality, rather than providing the tools necessary for the implementation of that functionality The goal isn't to create a format that allows all applications to talk to each other out of the box, but rather a format that allows a strong foundation to build the application specific protocol on.

Steve132 commented 9 years ago

I agree with meisme

However, I'd go even further and say that chunked transfers can easily be implemented by using a standard fixed-length array.

If you are receiving an array, and you know the number of elements you are supposed to receive, then just continue receiving elements of that type until you have received them all. Whether the transport layer caches this into chunks or not seems irrelevant to the data-layer of the protocol.

ghost commented 9 years ago

@Steve132 I really liked your quantification in the first reply - albeit your view, but I don't think drastically far off from reality...

To your point:

My OVERALL opinion is that UBJSON with nested STCs allowed (as the current status quo is) and a proposal for an ND-array marker in the header, is probably the best format anyone could possibly get.

Maybe moving forward on an ND-array proposal to have something concrete to talk about would help bring clarity here?

Steve132 commented 9 years ago

Sure

Done #61

Miosss commented 9 years ago

Remember that JSON can be fully specified in like, a 10 line diagram or a 2 paragraph description. BJSON has a single page of html, 90% of which is just the 10 types. XML was (initially) simple, and now it isn't (schemas, DTDs, XSLTs) all make it basically unusable.

That may be true. But if we come to such conclusion about JSON, then why we just don't throw anything else and just use JSON wherever we can?

UBJSON must offer something more than JSON (and JSON is the father, check goal number 1). Binary encoding of integers is not enough. There must be balance between simplicity and complexity, yet this standard must offer some advantages over JSON. Some true cons, instead of specific, edge-case optimizations.

Steve132 commented 9 years ago

But if we come to such conclusion about JSON, then why we just don't throw anything else and just use JSON wherever we can?

Because it's terribly slow for binary data. That's honestly the only reason to not use JSON. JSON is amazing for every case except large chunks of binary data.

On Sat, Dec 13, 2014 at 6:34 AM, Miosss notifications@github.com wrote:

Remember that JSON can be fully specified in like, a 10 line diagram or a 2 paragraph description. BJSON has a single page of html, 90% of which is just the 10 types. XML was (initially) simple, and now it isn't (schemas, DTDs, XSLTs) all make it basically unusable.

That may be true. But if we come to such conclusion about JSON, then why we just don't throw anything else and just use JSON wherever we can?

UBJSON must offer something more than JSON (and JSON is the father, check goal number 1). Binary encoding of integers is not enough. There must be balance between simplicity and complexity, yet this standard must offer some advantages over JSON. Some true cons, instead of specific, edge-case optimizations.

— Reply to this email directly or view it on GitHub https://github.com/thebuzzmedia/universal-binary-json/issues/60#issuecomment-66873617 .

MikeFair commented 9 years ago

I just opened #64 which hopes to present a compelling argument for that beyond what UBJSON has already done an overall good job of addressing, the main thing JSON is missing as a transport format in practice is the moral equivalent of a DIFF. And that what I think UBJSON is interested in becoming, to actually get used and provide something valuable, is a useable transport format, not a binary JSON document translator. In that context, chunked transfers, and more, are extremely valuable.

On the ND Array discussion; regardless of streaming JSON documents, enabling header information to describe the 1D array data as either row based or column based is also important. My experience says you've got a 50/50 shot of getting either type of in-memory representation depending on the creators preference or processing algorithms.

If you're going to take on optimizing matrix transfers, then consideration must also be given to sparse matrices. That's often the main thing missing from easily using JSON in practice, I have to send the sparse matrix as an object, and then recreate it as a sparse matrix on the other side. It'd be nice if that was more transparent and yes I think the spec ought to support the idea directly if it wants to transport matrices efficiently.

Both those things being said, what about being able to say "This is an array; and it uses an object to describe the layout of the 1D data".

What I mean by this is @Steve132 is right about implementing large transfers efficiently as 1D arrays with header information to describe the layout. Currently this has been approached as using a custom array layout using #, @, and a lot of [] and type letters. I think we'd all benefit from taking a more generalized approach that recognizes the benefit from having a flexible header descriptor?

Like say we have two array specs, regular and object based. Regular, is exactly that, a plain old JSON array that's already well described. Object based, starts like an array, but explicitly uses a JSON object to describe the 1D layout instead of using special code sequences. Rather than try and get tricky with parsing which requires certain characters not come before other characters and ensuring it doesn't have an ambiguous meaning, put an honest multifield object in there as the header. The space savings are worth it for these kinds of objects.

What's plainly obvious is that JSON, as defined, will never, ever, get used in any sanely competent practice using ND arrays. The sane way to deal with transmitting this kind of data really efficiently is using 1D arrays and adding header information to describe it. So I recommend making exactly that kind of an array. [[] // This is an array [$] // described by an object [{] // Here is the object that describes the layout of this 1D array ... [}] // Here is the end of the header [length] <insert 1D list of data items here> []] // This is the end of the array

Since we already have the cost of the type which can't be eliminated, I think we can eliminate having multiple array types at all by paying the cost of three extraneous characters in the base case (i.e. [}]).

As I understand it; currently the base case is [[][?][length][data][]] Where ? is either $ or # as a distinguishing factor of what to do next.

What if all arrays, regardless of type, always had an object based header by definition? That makes the base case look like this: [[][{][}][length][data][]] with an empty object descriptor. The [?] is replaced by [{], and we pick up an additional [}](I think its an acceptable price).

Now whatever is deemed proper can be put into the header object to describe the efficient transfer for many kinds of arrays, including sparse descriptors. The specified [length] is the number of elements in the transferred 1D array and has nothing to do with the array size that is actually defined in the header (unless the header is empty in which case [length] is the size of the 1D array).

Steve132 commented 9 years ago

So I know the benevolent dictator decided not to pursue the ND-array thing for now, so this doesn't matter, but I wanted to respond to this anyway. My response isn't intended as a way to rekindle that debate.

the main thing JSON is missing as a transport format in practice is the moral equivalent of a DIFF. And that what I think UBJSON is interested in becoming, to actually get used and provide something valuable, is a useable transport format, not a binary JSON document translator. In that context, chunked transfers, and more, are extremely valuable.

I'm sorry, but "not a binary json format" makes no sense to me, it's literally in the name. "Universal binary json"

I don't agree with you at all that the transport protocol layer and the data format layer are at all the same. It's not the job of say, xml, or html, to handle HTTP/1.0 chunked transfers.

UBJSON may be useful in some contexts as a way to communicate over the web, but that is absolutely not what it's about. It's a data format, that's IT.

As a matter of fact, I can't think of ANY data format that includes mechanisms for the underlying transfer protocol. Separation of concerns is a hugely important thing.

If you want to diff a file, thats what diff format is. UBJSON and JSON and data formats are NOT diff formats.

I respectfully just don't understand your philosophy at all.

enabling header information to describe the 1D array data as either row based or column based is also important. My experience says you've got a 50/50 shot of getting either type of in-memory representation depending on the creators preference or processing algorithms.

There's no such thing as a column based or row based in-memory representation. There is simply a linear data. Column-based or row-based is simply a way of defining how to interpret the dimension array. When computing the linear index, is the smallest stride dimension at the front of the dimension array or the back. My UBJSON spec says the smallest stride dimension is at the front, which makes it column major.

The reason it is not necessary to allow the user to specify this in UBJSON N-D arrays is that the user can load the data from memory using either convention. If their application requires accessing the data in a row-major way the user can simply reverse the dimension array before and and after.

Integrating that choice into the data format standard is completely un-necessary and complicated.

If you're going to take on optimizing matrix transfers, then consideration must also be given to sparse matrices. That's often the main thing missing from easily using JSON in practice, I have to send the sparse matrix as an object, and then recreate it as a sparse matrix on the other side. It'd be nice if that was more transparent and yes I think the spec ought to support the idea directly if it wants to transport matrices efficiently.

No, because the reason to transport matrices efficiently is not to create a general-purpose matrix format. UBJSON is not matrix-math specific.

The right way to think about UBJSON is that there are abstraction layers. There's the application layer, which is the layer at which users specify their format and data using UBJSON primitives. Their application is responsible for interpreting and validating those primitives as higher-level structures. Good examples of this might be specifying a User type as an object, or maybe some kind of RPC mechansm that uses an object {'method':'atan2','args':[2.0,2.3]} Or something else.

UBJSON can't be concerned about this.

Then the next layer is the data format layer. This is the layer that UBJSON is supposed to help. It provides efficient serialization of the set of primitives necessary to represent the application-level objects efficiently.

The next layer is the transport layer. This is how the binary data is literally stored on disk, or encrypted, or the protocols defining how it is transported over the web, or diffed, or versioned.

UBJSON cannot be concerned about this either.

It seems to me that a lot of our disagreement seems to be that we have different philosophies about the different 'layers'. I believe very strongly in keeping UBJSON limited to it's seperation of concerns whereas you are arguing to make it bleed into the layers above and below, which I disagree with strongly.

Anyway, the reason why the ND-array proposal is important is not that 'numerical problems involving matrices are cool and we should make UBJSON into matlab'. It's that it looks at an INCREDIBLY common Application-level structure that is repeated across dozens of applications (I would argue its present in literally every application that stands to benefit from UBJSON, and applications that don't have it wouldn't use UBJSON anyway). It says "This application-level structure is nearly ubiquitous across all our target applications, and there's no way to do it natively in UBJSON. Is there real benefit of some kind to making it into a primitive? Is the benefit worth the cost?" I made a case already that in my opinion the answer is yes.

However, since it's not just about 'lets have matrix math' then we have to address the same questions for the idea of making sparse arrays native in UBJSON. In my opinion the application-level structure of a sparse array already has an equivalent UBJSON-level primitive that is efficient enough for that use (an object), and I've seen sparse arrays used so rarely in real-world data that I don't believe the costs of 'upgrading' that application-level structure to primitive level is at all worth it. Most importantly, simply having an nd-array header doesn't imply we need to support sparse arrays natively just because they are both matrices, any more than having arrays implies we need to support binary trees natively because they are both sorted lists.

However, interestingly, the ND-array proposal is just that awesome because even if I agreed with you, (which I don't), it becomes possible to add sparse matrices to the ND-array proposal by adding the following lines to the spec (which was one of my 3 suggestions on ways to handle objects in the proposal)

When an object contains a '@' header, the number of dimensions d is read.  Then, each object has d strings (instead of one string) as preceeding the value.

I don't believe that's the best way to handle the '@' case with objects, but if I did then it has the benefits of 1) consistency in the header with array, 2) automatic native support for representing sparse ND-arrays efficiently as a primitive, just like you want.

What's plainly obvious is that JSON, as defined, will never, ever, get used in any sanely competent practice using ND arrays. The sane way to deal with transmitting this kind of data really efficiently is using 1D arrays and adding header information to describe it. So I recommend making exactly that kind of an

You and I agree on this.

Now whatever is deemed proper can be put into the header object to describe the efficient transfer for many kinds of arrays, including sparse descriptors. The specified [length] is the number of elements in the transferred 1D array and has nothing to do with the array size that is actually defined in the header (unless the header is empty in which case [length] is the size of the 1D array).

I challenge you to come up with a truly useful piece of data that would be found in this object that is not already semantically contained in the type of the array and the dimensions.

Seriously, like I already said, you can add sparsity by adding the consistent header to objects like I said before, and the column-major order or row-major order is not a relevant detail for the primitive level construct (at the application level, maybe, but its not a relevant detail) and I can't currently think of ANYTHING else that would even potentially be useful to describe a layout. Maybe stride or something but that makes no sense in a serialization context because there's no gaps in the data.

Adding a 'parse an object in the header' also really breaks consistency with the way UBJSON parses currently and adds (as you mentioned) some extra [{][}] bytes at least.

MikeFair commented 9 years ago

@Steve132 thanks for commenting, I was hoping you would becsuse I was really curious about your thoughts (and @kxepal but not quite there yet).

Ok so I think I have an answer here; if sparse arrays do not belong as a UBJ primitives then neither do ND arrays.

All arguments applicable to making ND arrays native are equally valid for sparse arrays for exactly the same reasons and for exactly the reasons you pointed out aren't JSON and therefore aren't UBJ.

JSON doesn't have an ND array. Based on the current UBJ scope, the solution to #43 is to have the application first create an object that describes their ND array, and then have UBJ encode it.

A simple test is that if the parser can analyze the data before deciding how to encode it, then it is no longer a binary JSON it is a binary format that is describing a JSON document. I think this is a clarifying distinction. The distinction is subtle, but they are not the same thing.

Is the use case for UBJ to be a binary JSON or is it a format for describing JSON documents that uses binary as part of its use case? I think the former is current descrirption and the latter is the actual intent.

One other clarification. DIFF is a format, not a protocol. And it is in the context as a format that I mean to describe it and its applicability to UBJ.

I agree my use cases and descriptions didn't necassarily make that clear.

Thanks, I really appreciate you eeighing in on this one.

Steve132 commented 9 years ago

All arguments applicable to making ND arrays native are equally valid for sparse arrays for exactly the same reasons and for exactly the reasons you pointed out aren't JSON and therefore aren't UBJ.

I addressed this explicitly already, and I don't agree that sparse arrays should be 'native' in UBJSON just because nd-arrays should be. I said that the reason for the distinction is ND arrays should be native because 1) there's no simple efficient primitive mapping for them 2) its a pattern found in pretty much every application that uses binary data 3) there is a compelling argument for application compatibility here. Sparse arrays should stay an application level structure because 1) there is a simple efficient primitive mapping already that's just as efficient: a nested object is just as efficient as a sparse array both in parsing and in internal memory and in API. 2) I've never seen sparse nd-arrays in ANY binary application that wasn't highly specialized. 3) There's no reason to believe that applications with different kinds of sparse arrays would be compatible.

Since nd-arrays have 1),2),3) and sparse arrays do not, I don't find it compelling enough to make them native.

But if I did want to do that, my proposal would be to extend the nd-array syntax to object like I described, which is very easy and very simple to parse and actually makes array and object MORE consistent with each-other and with UBJ in general and is extremely space efficient. The last thing I would do is radically change convention to have the parser look for an object in the header.

Is the use case for UBJ to be a binary JSON or is it a format for describing JSON documents that uses binary as part of its use case? I think the former is current descrirption and the latter is the actual intent.

I think that it's neither. I think that it is a format to consistently serialize complex structured binary data. That's it. The relation to JSON is incidental.

One other clarification. DIFF is a format, not a protocol. And it is in the context as a format that I mean to describe it and its applicability to UBJ.

Yes, I know this. I said it earlier.

If you want to diff a file, thats what diff format is. UBJSON and JSON and data formats are NOT diff formats.

If you wanted to make an application-level format for diffs around UBJSON, then go for it. UBJSON isn't natively going to provide that.

MikeFair commented 9 years ago

Oh, and by extension, these same arguments apply to structured containers. Unless the scope changes JSON doesn't have it so neither does UBJ.

So what about this, finish off UBJ 1.0 as just a JSON descriptor. The answer to effectively storing higher orders of structure for the time being is "make an object to describe what you want then encode that with UBJ".

Then given the usefulness of a format that describes a JSON document, rather than is a JSON document, work on that can then begin...

?

Or change the s

Steve132 commented 9 years ago

I think that it's neither. I think that it is a useful format to consistently serialize complex structured binary data. That's it. The relation to JSON is incidental.

MikeFair commented 9 years ago

On Feb 28, 2015 12:38 PM, "Steven Braeger" notifications@github.com wrote:

I think that it's neither. I think that it is a format to consistently serialize complex structured binary data. That's it. The relation to JSON is incidental.

And therein lies the confusion I'm speaking of. The relationship to json is not incidental; it's in the name.

And making that statement with a clear conscience and a straight face is exactly demonstrating why I'm calling it out.

I'm simply saying that in reading through the issues, I saw the underlying desire is to efficiently ship data structures between processes (either real time or async via files); not really to create a binary JSON.

Let's take stock and notice that. Once that gets cleared up, the way forward is clearer.

Steve132 commented 9 years ago

Obviously there is a relation to JSON in that it is inspired by json. You are right that it's in the name.

My point that I was trying to make is that even if JSON had never been invented then the format which we call UBJSON would still independently be incredibly useful and valuable as what it is: which is a serialization format. THAT is why I disagree that characterizing it as 'either a binary json 1-1 or a binary transfer coding for json documents' is accurate. You might use it for those things but that's not what it is. It's a way to serialize data.

MikeFair commented 9 years ago

I see a way everyone can get everything they after.

The connection to json has given the format a defined scope for decision making. I think it's been very critical for guiding the development process.

What is ij the format and what isn't when choosing what/how to parse and encode directly and what to push to an upper layer.

What's been clarified for me in this whole discussion is that ubj has thus far very successfully encoded json and met its stated goals and it's important to claim that victory and publish spec version 1.0.

There are always going to be more effective ways to describe a json doc and optimize the data structures. That ongoing optimization process should be part of the projects future. Think of it like compiler optimization. I can compile my source to binary and there's the direct translation which could be sloe to execute; Then there's the compiler's optimizers that analyzes my code and if it meets certain criteria that it understands it can rewrite my code for me.

The compiler is a fairly fixed and static piece of code; the commands/instructions for each cpu are extremely well defined. The optimizers however are an ongoing area of research. That's exactly the pattern we want to replicate here. We want a mechanism to create optimizations for future structures in an ongoing manner. Not be restricted by whatever wisdom we happen to come up with at the moment.

So that beng said, I think the wisest course as it relates to these higher order encodings is to block/prevent them from getting a direct binary encoding.

Then instead we introduce a second layer to the Ubj pipeline to deal with them. A layer whose explicit job is to analyze a document and find more effective ways to encode the document being evaluated.

This layer will take one json document and create a new json document that will be more efficiently represent the first for encoding into the binary representation. Then of course the reverse happens on the other side. The binary gets transformed to efficient json, then the more efficient json is retranslated back to the original and the document is handed off to the recipient/requestor.

Pretty much all optimizations/repackings can easily be described as json objects with 1D arrays and fields to describe how to interpret the 1D arrays. These cleaner json objects will binary encode nicely.

What goes into the binary structures is byte representations that can't be described in json. For instance using integers instead of strings to represent numbers. Using ascii or unicode or code pages for text; those are binary representations so those kinds of types go into the binary parser/encoder.

Anything else that requires a more high level restructuring of the document before encoding it goes into this other piece of code.

It's all still part of UBJ but this other piece of code has a different focus. It's focus is finding optimal ways to redescribe a json doc. Ways that will compact nicely to binary. Where the binary parser/encoder is expected to be a dumb and ultrafast binary translator, this other piece of code is expected to be super smart. And that guides the decision making process of what things to put where without forcing the binary format to play "what's my favorite data structure". This other piece of code might be made more user accesible, with callbacks perhaps, where they can possibly write their own transformations in addition to the ones ubj has already natively defined.

Thanks On Mar 1, 2015 8:50 AM, "Steven Braeger" notifications@github.com wrote:

Obviously there is a relation to JSON in that it is inspired by json. You are right that it's in the name.

My point that I was trying to make is that even if JSON had never been invented then the format which we call UBJSON would still independently be incredibly useful and valuable as what it is: which is a serialization format. THAT is why I disagree that characterizing it as 'either a binary json 1-1 or a binary transfer coding for json documents' is accurate. You might use it for those things but that's not what it is. It's a way to serialize data.

— Reply to this email directly or view it on GitHub https://github.com/thebuzzmedia/universal-binary-json/issues/60#issuecomment-76610880 .

MikeFair commented 9 years ago

@Steve132 I made the comment:

The specified [length] is the number of elements in the transferred 1D array and has nothing to do with the array size that is actually defined in the header (unless the header is empty in which case [length] is the size of the 1D array).

I challenge you to come up with a truly useful piece of data that would be found in this object that is not already semantically contained in the type of the array and the dimensions.

(As you seem to have already decided that such a thing cannot exist, I'm concerned you wouldn't accept anything as an answer; but just in case, I'm going to see if I can answer this challenge.)

First, I'm not sure that including "type of the array" is a valid constraint. There is only one "type" of array being encoded, a JSON array, which is always dense and always 1D. The fact that I can't presently describe the "type" of the array/object is exactly the inefficiency being discussed. What we're saying is let's find a good way to do higher order "types" and the discussion is "which ones".

And not to leave objects out, object types can be seen as an extension of the ND array. Objects can be treated like a 2xN array where dim 1 = the keys; dim 2 = the values; and N = the number of fields. In fact that is exactly the way they are handled internally by many dynamic languages.

"Useful" is in the eye of the beholder; so it seems this "challenge" is more accurately worded "I challenge you to come up with something I consider a truly useful piece of data ...".

I've already stated a truly useful piece of data that meets this criteria; sparse arrays (that's the eye of the beholder at work), but let's see if I can come up with an example that you agree is also useful.

A 1024 element (aka "large") array initialized to zero (or any other single value). You have a choice to either send 1024 bytes of zero; or you can send the number 1024 and the number zero (along with some metadata to explain it).
A large identity matrix (diagonal is 1) or any other diagonal matrix (only one value per row is needed)
A jagged array of arrays, (The sub-arrays do not share the same length, so is not ND)

So there's three swings at the challenge, how'd I do? Any hits?

MikeFair commented 9 years ago

I addressed this explicitly already, and I don't agree that sparse arrays should be 'native' in UBJSON just because nd-arrays should be.

and

In my opinion the application-level structure of a <_insert special structure here_> already has an equivalent UBJSON-level primitive that is efficient enough for that use (an object),

I think you're right. An object does a great job at describing all structures with a higher degree of order and so should be the mechanism used for describing them. I didn't see you make that proposal as I was reading through things but believe you when you say you did.

I agree with you that transforming these things into objects also brings consistency to the way objects and arrays are handled (which makes things simpler). I missed the proposal to make higher order arrays objects, was it in #60 (I'd like to read it)?

Steve132 commented 9 years ago

https://github.com/thebuzzmedia/universal-binary-json/issues/61 under "What about objects"

Steve132 commented 9 years ago

In my opinion the application-level structure of a already has an equivalent UBJSON-level primitive that is efficient enough for that use

I don't agree with your edit. I said "the application level structure of a sparse array" already has it. The whole point behind issues #43-#61 was the observation that an n-d array does NOT have such a primitive.

MikeFair commented 9 years ago

61 under "What about objects"

Oh, I read that entirely differently. I read that as what to do with @ if encoding with JSON objects, not using objects describing the higher order structures.

BTW, I keep failing to see how to describe "an array of objects" which I'd argue is even more common than ND (I'm thinking SQL resultsets here).

In my opinion the application-level structure of a sparse array already has an equivalent UBJSON-level primitive that is efficient enough for that use (an object) I don't agree with your edit. I said "the application level structure of a sparse array" already has it. The whole point behind issues #43-#61 was the observation that an n-d array does NOT have such a primitive.

Ok, I must be missing something obvious here, because an ND array does have such a primitive; an object. Same as the sparse array/matrix, which also doesn't have a direct JSON primitive.

The whole point of my edit/description is that if we take the approach of laying out everything with a higher order of structure as an object first, and then encode that object to binary, it's always going to be efficient and can be extended to support any layout desired. Users would have to add code for their own favorite layouts. But it's clear on how to go about using it to make whatever use case you've got efficient (repack it as an object efficiently, then have it be encoded).

An N-D array can easily be described with a JSON object just like any other structure.

This idea keeps everything nice and clean. It makes it so the encoder/decoder doesn't have any extra work or analysis to do (it simply translates as it sees it).

Miosss commented 9 years ago

@MikeFair

BTW, I keep failing to see how to describe "an array of objects" which I'd argue is even more common than ND (I'm thinking SQL resultsets here).

This is resolved by typespec, tool much more general than ND. See #50

ubjson / universal-binary-json