Wrong encoding when calculating hash

retog commented 3 years ago

on https://github.com/ssb-js/ssb-keys/blob/155db5133566fa7bc3ac1845291a3bf56503a635/util.js#L7 it defaults to the encoding "binary" which, according to https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings, is a legacy alias for "latin1".

The code should encode the strings as utf-8 instead which is the web standard and the only encoding supported by modern js implementations.

staltz commented 3 years ago

Unfortunately we can't change this without changing the SSB protocol, i.e. causing a breaking change.

arj03 commented 3 years ago

This is one of the reasons people have been wanting a new feed format ;-)

retog commented 3 years ago

Unfortunately we can't change this without changing the SSB protocol, i.e. causing a breaking change.

The SSB Protocol as described/defined in the protocol-guide does not mandate iso-8859-1 encoding.

The references python implementation explicitly uses UTF-8:

https://github.com/pferreir/pyssb/blob/975467030a6deeae6c5078ff10d90949e9adca56/ssb/feed/models.py#L75

        return dumps(self.to_dict(add_signature=add_signature), indent=2).encode('utf-8')

One of the linked nodejs examples uses UTF-8 implicitly:

https://github.com/ssb-js/ssb-keys/blob/155db5133566fa7bc3ac1845291a3bf56503a635/index.js#L118

So I think the error is with this implementation and not the protocol. The protocol document could be more explicit and mandate the standard utf-8 encoding, but using an encoding that encodes only a tiny subset of the json character set and of the characters the protocol explicitly allows for the type key, isn't a justifiable choice for a protocol implementation.

retog commented 3 years ago

This is one of the reasons people have been wanting a new feed format ;-)

Another reason would be to treat JSON as per the RFC rather than as an ordered collection of key/value pairs. Also, a standard for a signed tamper-proof feed could be used beyond Scuttlebutt. However, this is not a reason not to implement Scuttlebutt correctly according to a charitable interpretation of the spec.

christianbundy commented 3 years ago

The SSB Protocol as described/defined in the protocol-guide does not mandate iso-8859-1 encoding.

I hate to say this, but that means there's a bug in the protocol guide. We all agree that these warts in the reference implementation are annoying and gross and we'd love to remove them, but doing so would break compatibility with the existing SSB network. There are feed formats that are vastly more reasonable than this one (example), but The Feed Format That Most People Use is still the one with the warts.

The references python implementation explicitly uses UTF-8:

That implementation is, unfortunately, incorrect. The readme says "Please, don't use this for anything that is not experimental. This is a first attempt at implementing the main functionality needed to run an SSB client/server.".

However, this is not a reason not to implement Scuttlebutt correctly according to a charitable interpretation of the spec.

The reference implementation came long before the spec, so the specification is descriptive rather than prescriptive. A "correct" implementation of Scuttlebutt is [unfortunately] one that agrees with the reference implementation. If you're interested, I've put together an SSB validation dataset that aims to collect the majority of these odd edge-cases.

ssbc / ssb-keys

Wrong encoding when calculating hash #93