ottypes / json0

Version 0 of the JSON OT type
447 stars 64 forks source link

Transforming ops with different use of integer/string indexes doesn't work #37

Closed alecgibson closed 3 years ago

alecgibson commented 3 years ago

Given an object obj and an array arr, in JavaScript, these statements are true:

arr['123'] === arr[123]
obj['123'] === obj[123]

Therefore, these paths should be equivalent:

var p1 = ['foo', '1']
var p2 = ['foo', 1]

However, json0 doesn't treat them as such. For example, this transformation fails:


const op1 = [{p: ['a', '1', 0], si: 'hi'}]
const op2 = [{p: ['a', 1], lm: 0}]

json0.transform(op1, op2, 'left')

Actual result: [{p: ["a", 2, 0], si: "hi"}] Expected result: [{p: ["a", 0, 0], si: "hi"}]

alecgibson commented 3 years ago

I have a (correctly) failing test case:

it.only 'transforms ops that use strings and numbers differently in their paths', ->
  assert.deepEqual [{p: ["a", 0, 0], si: "hi"}], type.transform [{p: ['a', 1, 0], si: 'hi'}], [{p: ['a', 1], lm: 0}], 'left'
  assert.deepEqual [{p: ["a", 0, 0], si: "hi"}], type.transform [{p: ['a', 1, 0], si: 'hi'}], [{p: ['a', '1'], lm: 0}], 'left'
alecgibson commented 3 years ago

I wonder if we should normalize() the path?

josephg commented 3 years ago

This is by design. json0 has different semantics for lists and objects. transform doesn't have access to the document's type, so it uses at the type of the key (string / number) to disambiguate whether it should follow listy semantics or object semantics during transformation. The caller is expected to know, and be explicit in the call to transform.

Having op1 and op2 with ambiguous key types is UB - if you pass keys ['a', 0] and ['a', '0'] its ambiguous whether the data is shaped like {a: ['']} or {a: {'0': ''}}. If anything, if you mix and match strings and numbers in the operations passed to transform, transform should throw an error.

The second line in your test should throw, because this operation is invalid:

[{p: ['a', '1'], lm: 0}]

The path ['a', '1'] implies the document's shape is {a: {1: _}} and you can't do list moves on an object.

alecgibson commented 3 years ago

Hmm. That's a shame. We've had a subtle bug lurking for years where we had accidentally stringified an array index. Everything seems to work just fine for the most-part, because JS lets you do that, and I've only just tracked it down because it only screws up on transform.

The only suggestion I can think of is to extend the normalize() signature to accept the document itself (or add a new normalizeWithDoc()), so that we can sanity-check the op against the doc shape itself.

I have to say I'm slightly confused: why do the key types disambiguate? Shouldn't that be embedded in the semantics of using eg oi vs li?

alecgibson commented 3 years ago

(For anyone else who — like me — was thinking you could just coerce any "integer-like" path segment from a string to a number, don't forget that things like ObjectIDs can look like integers!)

josephg commented 3 years ago

Yeah, the question is do you want list semantics (where items are inserted in between keys, or do you want object semantics - where each update replaces the previous value at that key.

@nornagon might be able to speak more about the state of json0 - it might be possible to make it work in both cases given the operation semantics are named in the object. In json1 this type of thing is a hard requirement for users.

The code should really probably throw errors when people misuse the library like this because of how hard these bugs are to track down. It'd be great to have some PRs to catch these errors so others don't make the same mistake again. (Eg, check the path makes sense in apply() for the given data types).

nornagon commented 3 years ago

Agree with @josephg that this should throw. Garbage in, garbage out.

alecgibson commented 3 years ago

I'm not entirely sure this is "garbage in", since JS treats string and number keys the same, but I guess it's a valid API decision, and if that's the case then yes throwing would be good (although I personally still don't really understand why we can't accept both, given that object vs list semantics are embedded in the op properties).

alecgibson commented 3 years ago

I think with deeper objects, it's also less clear when to throw, because you might get this sort of situation:

var op1 = {p: ['foo', '1', 'bar', 'baz', 'qux'], oi: 'lorem'}
var op2 = {p: ['foo', 1], lm: 0}

json0.transform(op1, op2, 'left')

In this case, the op that "knows" about the list (and which can trigger a throw) looks "valid", but doesn't share a "common" path with op1, which is a subtler break.

alecgibson commented 3 years ago

Or do you mean to just throw in apply() when the path doesn't match the data structure? Should catch the majority of sharedb-related mistakes (since the op won't be submitted in the first place), but might potentially miss bugs where people use transform() independently.

I'm happy to make that change if you want, but I think throwing would technically be breaking like this other change that's in stasis.

josephg commented 3 years ago

Right, but what are the semantics there? Should it implicitly coerce ['foo', '1'] to ['foo', 1] which transforms to ['foo', 2]? Or should it coerce ['foo', 1] to ['foo', '1'] (then in this case throw?)

The library is written using the explicit rule that string keys imply objects, and number keys imply lists. The transform method needs to know the shape of the objects the operations modify. Its really neat that we can encode that information in the keys themselves using string vs number keys. (And that checks out, since JS objects always have string keys, and lists always have numeric keys). For all that there are some situations they coerce to each other, strings and numbers are different things with different semantics in javascript. They aren't interchangeable and using them interchangably will cause lots of obscure bugs in lots of situations. (Eg "5" + "10" === "510".)

The problem to solve here was how hard it was to find and fix this bug, presumably because of how rarely transform gets called in practice. You have it right - my preferred solution would be to make that bug easier to find by making the apply method throw when the key has the wrong type, instead of subtly working when changes are linear, but breaking when transform is called. But yeah, I agree that potentially breaks compatibility. Maybe it should console.warn the first time that happens or something. People probably won't see the warnings, but at least they'd be there.. :/ 🤷‍♀️

alecgibson commented 3 years ago

Right, but what are the semantics there?

Basically I think I was picturing less active coercion, and more "coersive equals" checks. ie the path you submit remains untouched, and is committed to the DB as-is, but anywhere where we check if paths are equal, we use == instead of ===, and if we need to know list vs object semantics, that should be encoded in the op property (unless there are edge cases I'm unaware of, which is more than likely! I guess off the top of my head, path incrementing/decrementing would be broken without an active coercion).

But if that's all too complex and/or just not how you want the library to work, then fine. Given my unfamiliarity with the library, I think I'm going to update our fork to throw (since the mismatch throwing is how I caught this bug in the first place!). I can open a PR to console.warn if that's what you want?

I personally think it would be nice to have some sort of module-level opt-in strict mode that throws (although from a sharedb point-of-view, syncing module options between server and client isn't really something we do yet, although we've vaguely discussed it)