Open ghost opened 8 years ago
I think I misunderstood how no-op is used. It has now been fixed(?) above.
Note I decided to allow no-op anywhere it would not cause ambiguity. This allows the sender of a message to send keep-alive no-ops while it’s collecting all items in a container to count them and to see if they’re all the same type.
Note2: the grammar in its current state contradicts the "no data payload" example because it does not allow no-op as a type. If no-op really is meant to be considered a type, expect an issue calling that a bug :) .
For the sake of helping me ensure my parser did everything it was supposed to, I wrote up this context-free grammar using ANTLR3 syntax. It focuses entirely on the rules governing interpreting a raw byte stream, and considers no-op to be a type, because this is apparently what the standard calls for. I'm putting it up here so people can compare different interpretations of the standard to help identify weaknesses (like the no-op issue) or clarifications that should be made (like in #73). This grammar explicitly allows nested typed containers, because I could not find anything in the specification saying otherwise. Note that ANTLR cannot generate a parser out of this because the optimized containers are ambiguous without additional context.
grammar ubjson;
fragment
BYTE : ('\u0000'..'\uFFFF');
fragment
CHAR : ('\u0000'..'\u007F');
ubjson : object ;
byte1 : BYTE ;
byte2 : byte1 byte1 ;
byte4 : byte2 byte2 ;
byte8 : byte4 byte4 ;
bytes : BYTE* ;
integertype : 'i' byte1 | 'U' byte1 | 'I' byte2 | 'l' byte4 | 'L' byte8 ;
id : integertype bytes ;
numerictype : integertype | 'd' byte4 | 'D' byte8 | 'H' id ;
stringtype : 'S' id ;
valuetype : numerictype | 'Z' | 'N' | 'T' | 'F' | 'C' CHAR | stringtype ;
bareobject : (id type)* '}' | '#' integertype (id type)* | '$' type '#' integertype (id baretype?)* ;
barearray : type* ']' | '#' integertype type* | '$' type '#' integertype baretype* ;
baretype : id | bytes | CHAR | barearray | bareobject ;
object : '{' bareobject ;
array : '[' barearray ;
//object : '{' (id type)* '}';
//array : '[' type* ']';
containertype : object | array ;
type : valuetype | containertype ;
This part of the spec
; Typed containers cannot contain typed containers themselves, so we need a way to disable the type system.
That's not true.
While preparing for https://github.com/tbuitenhuis/zgio I’ve been writing an EBNF specification of ubjson. I’ve been asked to post it here.
After some cleaning up and finding answers to remaining questions this will be useful to others writing ubjson parsers and generators, and I intend to get it to that state around the time of the next draft or sooner. Right now these are just my notes. Help looking for misunderstandings will be appreciated.