gritzko / ron

(dated, see the site) Replicated Object Notation, a distributed live data format, golang/ragel lib
http://replicated.cc
Apache License 2.0
360 stars 7 forks source link

String delimiters are incompatible with JSON #16

Closed cblp closed 6 years ago

cblp commented 6 years ago

When strings are encoded as JSON strings, the apostrophe ' is not escaped, so the first apostrophe inside the string will end the string.

Why not use quotes " as in programming languages? Use of quotes will make the parser to look for \" what is a bit more complex than looking for just a single byte.

So, what byte definitely cannot occur inside a JSON-encoded UTF-8 string? Maybe only 0-byte.

Alternative solution: continue using the apostrophe ', but add a special requirement that ' must be encoded as \u0027.

gritzko commented 6 years ago

I use ' precisely because most languages use " by default. That way, RON strings don't need that much escaping.

var ron = "*lww#obj@time+origin:field 'your text here' ";
cblp commented 6 years ago

Weak motivation, but ok.

So, how are you going to solve the bug?

gritzko commented 6 years ago

You mean, how to use JSON.stringify() for RON strings?

cblp commented 6 years ago

I mean JSON encoding, implemented as JSON.stringify() in case of JavaScript, or as Data.Aeson.encode in case of Haskell.

cblp commented 6 years ago

"*lww#obj@time+origin:field 'your text is't here' "

gritzko commented 6 years ago

We do a second pass in swarm.js https://github.com/gritzko/swarm/blob/master/packages/ron/src/index.js#L203

Initially I planned to use JSON encoding as is. Maybe there are some other reasons, I can't remember...

gritzko commented 6 years ago

STRING_ATOM: /"($UNICODE|\\.|[^"\\])*"|'($UNICODE|\\.|[^'\\])*'/,

Jun'2017 grammar was like this. I guess, two kinds of quotes complicated everything and double quotes needed hellish escape sequences, even in my unit tests.

gritzko commented 6 years ago

I think, we should not obsess over this. The binary protocol is more convenient for running in prod anyway. Base64 is critical for debugging/testing.

cblp commented 6 years ago

We do a second pass in swarm.js

So what about fixing it in the RON spec?

gritzko commented 6 years ago

So what about fixing it in the RON spec?

You mean, flipping to double quotes?

cblp commented 6 years ago

I think, we should not obsess over this.

Ok, but let's make it valid first.

The binary protocol is more convenient for running in prod anyway. Base64 is critical for debugging/testing.

SQL databases use SQL-code dump as backup, for example.

I want to use text format as a replacement for JSON in my ff app when using git backend.

Text format may be usable outside of debugging. We should care about it.

cblp commented 6 years ago

So what about fixing it in the RON spec?

You mean, flipping to double quotes?

I mean postulating s/'/\u0027/g on the protocol level

cblp commented 6 years ago

I thought about it a little bit more, and now I'm strictly against single quotes.

Single quotes induce extra complexity on encoding algorithm (extra escaping).

Nobody should write

var ron = "*lww#obj@time+origin:field 'your text here' ";

in the real client or server code. Maybe in tests only.

cblp commented 6 years ago

This is the wrong way. Sorry. Double quote only a bit easier, when using third-party JSON serializer/parser, not significantly.

cblp commented 6 years ago

Fixed in e897eca6edfe8c3dacb5d8c46c8fbe5057cf8961