groupon / cson-parser

Simple & safe CSON parser
BSD 3-Clause "New" or "Revised" License
133 stars 27 forks source link

Formal definition of CSON #26

Open hildjj opened 9 years ago

hildjj commented 9 years ago

Moving https://github.com/bevry/cson/issues/38 over to this project.

JSON suffered from being too tied to the JavaScript programming language early on. I suggest a document describing the format being parsed, so that other interoperable implementations can be built.

The key here is "interoperable". I want to write CSON parsers in C, Python, etc. that don't have the assumptions of ECMAscript (particularly with respect to duplicate keys, strings, and numbers) baked in.

jkrems commented 9 years ago

Thanks! This sounds like a great idea. My biggest concern would be us breaking existing CSON files out there (that rely on all the dirty hacks CoffeeScript allows).

hildjj commented 9 years ago

It's better to break things earlier than later. I would suggest starting by declaring what you think the format is, then dealing with the edge cases as they are reported. It's not going to get easier as CoffeeScript evolves.

dgreensp commented 9 years ago

Yeah, without a spec, CSON parser libraries written for other languages are extremely unlikely to accept the same set of inputs and produce the same outputs. A file written for one will break when parsed by another. (I've played around with CoffeeScript a little, and heck if I can tell you how the compiler will parse a given input.) You'll have Markdown all over again.

ghost commented 9 years ago

Just a thought, how about the spec being written in literate coffeescript with executable test cases?

jkrems commented 9 years ago

If we add a spec, I would think more along the lines of a grammar, e.g. a PEG. That way we could also implement cson-parser in terms of that spec. And PEG should be reasonably portable so that it's easy to consume/port to other languages.

ghost commented 9 years ago

:+1:

dgreensp commented 9 years ago

The gold standard for a spec is, of course, the JSON spec (http://www.json.org/), lest anyone think a spec must be a long, stuffy document. All a spec has to do is communicate the language to a human implementer, and be unambiguous. A PEG works as long as it is sufficiently human-readable.

jkrems commented 9 years ago

The JSON spec format works for a simple, straight-forward data format like JSON. I doubt it will still be nice and understandable when it meets the monstrosity that is CoffeeScript syntax. ;) I'm not 100% convinced that implying CSON is a viable data interchange format is doing any good. Especially since CSON supports operations that are pretty tightly coupled to JavaScript floating point semantics etc.. That's not an argument against properly spec'ing how it looks like - just against adding wording to the docs suggesting that it's a good idea to use it across stacks instead of JSON or YAML.

hildjj commented 9 years ago

So, you're saying that if I don't have a CoffeeScript parser handy in my language, I should use YAML. (JSON doesn't have comments) I'll accept that, stop complaining, and leave you to your much smaller corner of the Internet than you could have had.

jkrems commented 9 years ago

I still believe this is worth doing. Sorry if my previous comment was misleading in that regard.

dgreensp commented 9 years ago

Atom stores configuration data on disk in CSON, so that's what got me interested in whether this was a "real" format (that could conceivably be read by an arbitrary program) or not. That said, Emacs configuration is stored in the form of ELisp programs, and it's still my favorite editor. :)

tomek-he-him commented 9 years ago

By the way, it looks like CSON is a superset of JSON.

jkrems commented 9 years ago

I'd add an "(*) well-formatted JSON". But yes, definitely worth mentioning in a potential spec.

jmatsushita commented 9 years ago

+1

d-frey commented 7 years ago

I just found CSON, and it looks interesting. I am one of the authors of a JSON library, but our architecture allows us to parse and serialize other formats as long as the internal data model is sufficiently similar. CSON seems to fit that bill.

As we are a C++ library, any kind of reference to Coffee-Script (or a reference implementation written in CS) is mostly useless to us. I can only repeat and stress the importance of having a real specification in an implementation-language-agnostic way.

We are also not just the authors of a JSON library, but for parsing we are also using our own PEG parser library, the PEGTL. I have some experience with writing extended JSON grammars, e.g. we are just about to define a standard for "relaxed JSON", calling it JAXN.

I'd like to see if CSON is another candidate for our library and if there is interest from your side to come up with a more formal specification for it. If you could write a (complete) list of features that CSON should have (being a sub-set of Coffee-Script), I could try to come up with a PEG or a CFG for it (similar to the actual JSON grammar from RFC 7159).

I'd like to co-operate on this, but I would also like to avoid wasting each other's time in case we can not agree on some common goals. Please let me hear your thoughts about this and whether you can see CSON becoming a Coffee-Script independent, self-contained standard (which can still be a sub-set of Coffee-Script, that is not a problem).

jkrems commented 7 years ago

I know that @dbushong spent some time wring a PEG (?) for CSON while trying to migrate away from our dependency on coffee-script. I'm not 100% sure where exactly he ran into problems. I think it was something about the fairly liberal whitespace handling in coffee-script..?

dbushong commented 7 years ago

Yeah, lemme see if I can find my work thus far and stick it somewhere.

dbushong commented 7 years ago

https://github.com/groupon/cson-parser/blob/dpb-native-parser/src/cson.pegjs

There's what I've got thus far. The issues I ran into were, unsurprisingly, around corner cases in object tree parsing. In certain cases (I'll try to dig up a repro) exdented objects are incorrectly parsed as part of the preceding object.

d-frey commented 7 years ago

A grammar will usually be only a starting point, additional rules will apply. This is even the case for the JSON grammar itself. I'll check out the grammar you wrote/linked and report back when I had some more time for it... thanks so far.

refactorized commented 6 years ago

I just want a quick way to know how to include '\' in a string value. There does not seem to be a simple here-are-all-the-rules document for this, or I am missing it. Seems to me maybe that's more appropriate an issue to bring up at bevry/cson#38 - but that issue, of course, led me here.

dbushong commented 6 years ago

I believe CSON accepts "..." or '...', so you should be able to say foo: "this 'and' that"

You also should be able to \ things, so even foo: 'this \'and\' that'

ehhc commented 5 years ago

Hey guyes, i want to use CSON in a flutter/dart project (due to interoperability with legacy code). Unfortunately, there is no dart parser for CSON. Furthermore, without any written specification, it's hard to write a parser on myself.. Any ideas what to do? Do you, by chance, know about any dart CSON parser?

dbushong commented 4 years ago

This thread is about defining a spec that could be parsed with a PEG grammar. I started and abandoned defining this a while back, but currently the spec is "what this version of coffeescript + this library can parse" - sorry