paragonie / paseto

Platform-Agnostic Security Tokens
https://paseto.io
Other
3.23k stars 108 forks source link

Payload key ordering #101

Closed hauleth closed 2 years ago

hauleth commented 5 years ago

I couldn't find any information about what is the expected order of keys in the payload, it would be useful that in any implementation such code:

encode(decode(Payload)) == Payload

Currently from what I see there is nothing that ensures such behaviour. This could be achieved with #90 and usage of ASN.1 DER format for example, but right now standard requires payload to be correct Base64 encoded JSON token.

paragonie-scott commented 5 years ago

We really don't want to involve ASN.1 in such a simple and narrowly focused project.

Currently there's nothing enforcing key order. But, https://3v4l.org/dnpK4

If there's another programming language where sorting is done automatically, we can consider enforcing this in the standard. It might even make sense to do so. But that'd be a BC break, of course.

hauleth commented 5 years ago

If there's another programming language where sorting is done automatically

It doesn't matter as JSON standard do not enforce any particular order.

But, https://3v4l.org/dnpK4

In this form it doesn't matter as this is comparing the PHP array with different array, what I am saying is that it should produce exactly the same PASETO token. For example in Erlang/Elixir one can use jsx which can encode property list (ordered key-value list) with one order, and then on the other "end" of the communication pipeline someone is using different library, ex. Jason which decodes data into hash map. That will change order of the fields as hash map (naturally) do not need to preserve order (and rarely do). So we can easily end with implementations that will generate different tokens for exactly the same data.

hauleth commented 4 years ago

The simplest (but still breaking change) would be to relax requirement about payload data. Instead of requiring it to be JSON data just state that this is free form string, and this will solve the problem. However if that would be expected from the users.

rlittlefield commented 4 years ago

The simplest (but still breaking change) would be to relax requirement about payload data. Instead of requiring it to be JSON data just state that this is free form string, and this will solve the problem. However if that would be expected from the users.

The underlying PASETO token system actually works fine without JSON. The python implementation allows you to pass in alternative encoder/decoder functions, so it is pretty easy to swap it for msgpack or protobuf.

However, PASETO has been defined to be specifically JSON, which helps standardize some important pieces, like how to handle expirations (some formats wouldn't be able to hold an expiration because they could just be a protobuf integer). Solutions to solve that end up making the protocol itself more complex (baking things like issued-at or expiration into a predefined field in the token).

paragonie-scott commented 4 years ago

Honestly, the easiest solution here is to ensure the arrays/objects/etc. that are JSON-encoded get their keys sorted before the json_encode() step. Then keys are ordered regardless of the default behavior of your language and people can stop worrying about this.

hauleth commented 4 years ago

In most languages it is not possible to sort keys of the map (JSON object) without writing custom encoder, and writing such encoder makes it harder to implement correctly (as you cannot use most of the JSON encoders out there without hacks/copying their code).

paragonie-scott commented 4 years ago

If the language is ordering automatically, they should be doing the correct thing here. Otherwise, that's a language bug that needs to be fixed upstream.

If the language is not ordering automatically, and there is no built-in way to sort keys, the simple way to enforce an ordered object is:

  1. List all of the keys in the object.
  2. Sort them alphabetically.
  3. Create a new object, one key at a time, from the alphabetical list. And then copy the values over.
  4. Encode the newly created object.

Downside: It adds complexity. But if people really really really care about Decode(Encode(X)) == X consistency, this buys it for them.

hauleth commented 4 years ago

If the language is ordering automatically, they should be doing the correct thing here.

Why should they? JSON standard do not enforce particular ordering and most languages use some kind of hash maps to store such data, this mean that the order of the keys in the map can even change from run to run.

But if people really really really care about Decode(Encode(X)) == X consistency, this buys it for them.

That is why I would vote to remove payload format from the RFC at all. Just accept any string, and if users want to specify other data and fields then they can build that on top of PASETO itself. So the "basic" would not specify payload format at all, and there could be "PASETO-JSON" that would define stored data as JSON encoded string.

paragonie-scott commented 4 years ago

Honestly, I'm trying to kill off JWT so PASETO needs to be a drop-in replacement above all else.

Adding layers and complexity doesn't help with that goal.

hauleth commented 4 years ago

Honestly, I'm trying to kill off JWT so PASETO needs to be a drop-in replacement above all else.

And I wholeheartedly support that goal, what I am saying is that having PASETO as a „general case” signed or encrypted data format and PASETO JSON where that data is always JSON-encoded structure isn’t IMHO that much of the complication. But maybe I am strange and it would be problematic for users.

paragonie-security commented 2 years ago

https://github.com/paragonie/paseto/pull/127/commits/c349cab2a452a546b458f8017f99f7311a7e5ba0

The branch for V3/V4 contains a change that should clarify this.

When we talk about v1, v2, v3, or v4, we're talking about a specific cryptographic protocol with a specific encoding. (JSON.)

When we talk about Version 4, for example, we're only talking about the underlying cryptography that goes into a v4 token. The actual format might one day differ (e.g. v4i for the Ion Schema, v4y for YAML, etc.).

Thus, you can talk about "PASETO version 4" as a general case data format and "PASETO v4" as a specific token format.

We intend to explore other encoding formats after PASETO v3/v4 and PASERK are finalized.