The feeds are non-canonical and completely depend on behavior of internal v8 object representation

ezdiy commented 7 years ago

There's no canonical form to hash/sign, it depends on v8 object insert-order and Go people complained (rightfuly so) about difficulty of re-implementing this quirk. This can be fixed:

function canonize(msg) {
  // this is the current "subjective" insert order used by all feeds (needs checking)
  var keys = ['previous','author','sequence','timestamp','hash','content','signature']
  var res = []
  for (i = 0; i < keys.length; i++) {
    var k = keys[i]
    var vcanon, v = msg[k]
    if (v === undefined) continue
    if (typeof(v) === "number")
      v = v.toString()
    else if (typeof(v) === "string")
      v = '"'+v.toString()+'"'
    else {
      // TODO: make it mandatory eventually, checking the property is for gradual fixing
      if ((k === 'content') && (vcanon = Object.getPrototypeOf(v).canonize))
        v = vcanon.apply(v)
      else
        v = JSON.stringify(v, null, 2)
    }
    res.push('  "'+k+'": '+v.replace(/^/mg,'  ').substring(2))
  }
  return "{\n" + res.join(',\n') + "\n}"
}

var data = {sequence: 1, author: "aaaa", signature: 'xyz', content: ['blah','xah','xar','dar',1,2,3,4]}
console.log(canonize(data))
console.log(JSON.stringify(data, null, 2))

Should v8 change object key ordering in the future, scuttlebot will cease to function completely until it starts keeping original string forms for 'content' (can be done via weakref f.e.).

ansuz commented 7 years ago

relevant

ezdiy commented 7 years ago

@ansuz The problem here is that SSB in fact has a canonical form - just a ridiculous one: Keep fields as they come in, and attach 'signature' to the end.

This is called insert order, and it is very awkward when you convert the json to different representation. Basically you need to keep everything in original form, or serialize/deserialize to your format while keeping metadata about ordering so you can rematerialize the original exactly as it came in.

The proposed fix is to set ordering of the top fields (author, timestamp, etc) in stone (namely, to the thing node currently is using), since those fields are treated by the network specially anyway.

dominictarr commented 7 years ago

Oh btw, this document was the influence on the ssb signing format: https://camlistore.org/doc/json-signing/

Another approach, would be to always keep a handle on the original serialized json text, and parse that use it, but never serialize it again, instead send the originally serialized json, inserting/extracting the signature with a regexp.

dominictarr commented 7 years ago

Also, firefox and chrome both support that ordering - so it's a de facto standard

dominictarr commented 7 years ago

oh, btw, we could integrate this by canonicalizing messages when they are created, which would still produce valid signatures for legacy implementations, but be in canonical format.

ezdiy commented 7 years ago

@dominictarr The idea of the above script is to canonicalize both on creation and verification, but only the top fields. I don't have all the feeds, but the ones I've checked thus far do have this pseudo-canon order.

As for contents of 'content', the spec of it will always remain awkward. It basically says: the string supplied to 'content' field should be padded by 2 spaces on each newline. Because reasons. It can't be canonicalized per se, only tracked as a string (weakref via JSON parser hook).

Another fix would be simply documenting current behavior: all dicts which appear in json are ordered (similiar to python ordered dict), and all objects are represented as one receives those on the wire - no special exceptions. It basically means people will have to re-implement json serializers to accomodate this.

Again, minor oddity compared to even more odd things ssb does.

cryptix commented 7 years ago

I'd really like it if we can find a solution for this. It doesn't have to be a golden catch-all thing but at least something that works for the foreseeable future until we now how to upgrade out of this mess while keeping our sanity.

@ezdiy' proposal to commit to what we have been doing in the past sounds very reasonable to me and also is something that I feel much better maintaining than what ever v8 feels like today.

ezdiy commented 7 years ago

A simplest thing we could do - don't touch sbot node code at all, but make the current top-field order part of verification rule. All ssb feeds will pass the check now, but it could cease to be the case if we don't in the future.

One unresolved thing I have no idea how to handle (the above script simply omits those): What to do about protocol-unused fields, that is, some other field not in the ['previous','author','sequence','timestamp','hash','content','signature'] set? What is the ordering of those should it be even allowed?

ezdiy commented 7 years ago

And found another one:

  if (_last === time)  {
    do {
      adjusted = time + ((_count++) / (_count + 999))
    } while (adjusted === _adjusted)
    _adjusted = adjusted
  }

It produces decimally non-representable numbers, and then compares those using == operation, which is not valid for floats - you'll get a float smaller or bigger one at pretty much random depending on rounding, but almost never same one.

Worse still, JSON has no defined rounding rules for fine fractions. Apparently it's anything between 8-12 decimal places, but quite likely it won't survive re-serialization anyway.

Most protocols with monotonic timestamping simply use integer timestamps, and special case same timestamps if they have a sequence number - that one decides lexicographical order.

Interestingly, @Vendan in spite of all his Go elitism, introduced similiar bug in his Go port. Though with less impact - 2 decimal places are quite likely to fit for at least 100 years.

dominictarr commented 7 years ago

@ezdiy hmm, not sure what you mean, can you show me an input with this problem?

we can easily relax the input to be >= not just > but we'd have to wait a while till everyone is running the updated code.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

cryptix commented 5 years ago

I think this is still an issue and @AljoschaMeyer might have a thing or two to add. At least as a forecast, maybe.

AljoschaMeyer commented 5 years ago

It's somewhat resolved. The spec defines a canonical format. The js implementation implements that format, but does so by relying on some unspecified behavior.

As far as I am aware, the only reliance on unspecified behavior is the assumption that non-numeric keys of the content object preserve their order across JSON.stringify(JSON.parse(foo)) transformations.

dominictarr commented 5 years ago

@AljoschaMeyer btw, did you see https://github.com/ssbc/ssb-validate/pull/14

AljoschaMeyer commented 5 years ago

@dominictarr I answered to that one on ssb: %RFH6ayPX5I5+cMS3MXVbhZ16ykXBtayc9YGba0tc/xk=.sha256

ssb-junkyard / ssb-feed

The feeds are non-canonical and completely depend on behavior of internal v8 object representation #11