michalmuskala / jason

A blazing fast JSON parser and generator in pure Elixir.
Other
1.61k stars 169 forks source link

Add an option to decode JSON objects as lists #143

Closed alopezz closed 2 years ago

alopezz commented 2 years ago

This lets the user get the fields of a JSON object in the same order as they appear in the input, which can be useful for certain use cases.

The change is smaller than what the noisy diff suggests, but the way it's implemented I had to add the new decoding function to all affected function signatures. Maybe a refactor to have all the decoding functions (key_decode, string_decode, object_decode) would help here?

Essentially, this just adds an :objects option to Jason.decode, based on which either the default (and original) :maps.from_list is used, or the object is retained as a list of pairs (tuples), reversed so that it's in the same order as the input.

Let me know if this feature is useful!

michalmuskala commented 2 years ago

This is potentially useful. However, my main concern is that this is not "round-trippable" - i.e. if we decode to keyword lists we can't encode the input back again to JSON. I think some more consideration around this is needed

alopezz commented 2 years ago

Would adding something like:

def value(value, escape, encode_map) when Keyword.keyword?(value) do

And encoding that as a map do the trick? It's true that that would only allow reversal of those who were decoded with both objects: :lists and keys: :atoms options.

Or it's also possible to treat every list of pairs (2-element tuples) as potential maps and encode them as such, which wouldn't be ambiguous since there is no support for encoding tuples anyway.

alopezz commented 2 years ago

I made an attempt at allowing "round-trippability".

But no matter what, there's a case that becomes ambiguous, which is empty collections. If we start with an empty json object {}, decode it as a list with objects: :lists, and encode it again, there's no way to know that this was originally an empty object and not an empty array. The only thing that can be done is abandon round-trippability for that specific case and accept that it will become an array after the round trip. If this is unacceptable, then I guess it can't be done.

michalmuskala commented 2 years ago

I'm afraid this attempted conversion might be too high of a cost to always pay when encoding lists - I'd expect it will slow encoding considerably.

One idea would be to introduce a OrderedObject wrapper struct and make decode to it, rather than directly into keyword. This also allows us to later implement the encode protocol on it, and round-trip exactly.

Something like:

defmodule Jason.OrderedObject do
  defstruct values: []

   def new(values), do: %__MODULE__{values: values}
end

defimpl Jason.Encoder, for: Jason.OrderedObject do
  def encode(%{values: values}, opts) do
    Jason.Encode.keyword(values, opts)
  end
end

And later rather than calling Enum.reverse when returning a keyword we would do Jason.OrderedObject.new(:lists.reverse(values)).

Users would be able to access the values with ordered_object.values. Potentially we could also implement Access for OrderedObject so ordered_object["key"] just works directly.

The option could also be called something like objects: :preserve_order.

alopezz commented 2 years ago

Sounds like a plan! Thanks for the feedback, I'll work on this approach.

alopezz commented 2 years ago

I've updated the patch based on your suggestions. So now, using objects: :preserve_order decodes json objects into OrderedObjects, and encoding OrderedObjects results in json objects (also respecting the order).

I went a step further and implemented both the Access behaviour as well as the Enumerable protocol. That's the reason behind adding the ordered_keys entry to the OrderedObject struct. Doing so makes it easy to process the decoded objects directly or convert them to lists or maps as desired.

Let me know if you think it's a good idea or if you still prefer to simply let users rely on accessing the values directly from the struct. I suppose my approach could affect encoding performance slightly because of having to construct the ordered list on the fly.

michalmuskala commented 2 years ago

Merged manually in 7e25ebb8ecf11bf959f805a767661f1c24c2a8fa