thoth-org / Thoth.Json.Net

https://thoth-org.github.io/Thoth.Json/#.Net-and-NetCore-support
MIT License
33 stars 16 forks source link

Thoth parses invalid JSON #52

Open njlr opened 1 year ago

njlr commented 1 year ago
#r "nuget: Thoth.Json.Net, 10.1.0"

open Thoth.Json.Net

let json =
  """
  {
    "x": 123,
    "y": 456,
  }
  """

match Decode.fromString Decode.value json with
| Ok _ ->
  printfn "The JSON is valid"
| Error error ->
  printfn $"The JSON is invalid: \n%s{error}"

The JSON is valid

This should not parse since the trailing comma is out of spec.

MangelMaxime commented 1 year ago

Thoth.Json.Net is not directly responsible for parsing the JSON. It is using Newtonsoft under the hood.

It seems like Newtonsoft is permissive by default...

https://github.com/JamesNK/Newtonsoft.Json/issues/818

Unfortunately, there isn't much we can do on Thoth.Json.Net side about it right now.

njlr commented 1 year ago

System.Text.Json does not appear to suffer from the same issue:

open System.Text.Json

let json =
  """
  {
    "x": 123,
    "y": 456,
  }
  """

let x = JsonDocument.Parse json

printfn $"{x}"

System.Text.Json.JsonReaderException: The JSON object contains a trailing comma at the end which is not supported in this mode. Change the reader options. LineNumber: 4 | BytePositionInLine: 2.

What are your thoughts on building a third implementation of Thoth backed by System.Text.Json? Thoth.Json.System?

Happy to contribute toward this.

MangelMaxime commented 1 year ago

What are your thoughts on building a third implementation of Thoth backed by System.Text.Json? Thoth.Json.System?

The plan right now, is more to stop having 2 implementations of the library and use a single/unified package. The reason behind this choice is to reduce the number of packages to maintain and also to allow people to easily create libraries/package depending on Thoth.Json.

Right now, if they want there package to be compatible with .NET and Fable they needed to depends on both Thoth.Json and Thoth.Json.Net and use compiler directives in there code. It would be better if they didn't have to do that.

I am still unsure if I want to hard code the implementation or create an interface which contains more of less the Helpers module in it.

https://github.com/thoth-org/Thoth.Json.Net/blob/a3d82922cb2e2892a019dff1b308dc6813a49dbc/src/Decode.fs

Using the interface means losing a bit of performance but it was never a priority with Thoth.Json.

Solution could also be a mix:

But I feel like the mix is going harder to create and perhaps easy to break.

If I am using a custom JSON parser I can unify almost all the code and only worry about .NET / Fable difference in the reflection API. However, creating a JSON parser is not that simple as if we want to be compliant to the spec it is almost impossible because the spec itself doesn't specify everything...

And after that people say that JSON is simple :p

My message got a bit side tracked but I hope it answer a bit your question. Feel free to comment if you want to speak more about it

olo-ntaylor commented 1 year ago

@MangelMaxime @njlr I have suddenly become very interested in this comment:

What are your thoughts on building a third implementation of Thoth backed by System.Text.Json?

We are using Thoth extensively in an API/back-end F# project, and it is working really well with us. However, we have some hot, high-load endpoints/code paths where performance is really critical, and one thing that has come up is that System.Text.Json outperforms Newtonsoft by a significant margin, almost 2x as fast with 1/2 the memory allocations, according to several benchmarking reports. We'd really like to keep the "parse, don't validate" approach, but it's hard to justify taking an unnecessary performance hit like that.

Hypothetically, if one was going to fork this repo to something like Thoth.System.Text.Json, what would be a high-level roadmap to follow? It looks like System.Text.Json exposes roughly the same kind of primitives that Thoth depends on (i.e., JsonDocument and JsonElement instead of JToken), so at first glance it seems like the conversion would be somewhat straightforward, just a matter of changing out the JSON primitives we're using. Are there any gotchas you are aware of? (I'm sure there would be some differences in the APIs exposed by each, of course.)

If it's really a mostly straightforward transition, we could consider pitching in and help with that fork. On the other hand, if it would be a virtual re-write and take months of work to complete, that might sway us in a different direction.

(Btw, @MangelMaxime really appreciate all your work for the F# ecosystem. I've learned a ton from your work and benefited from your work on several F# libraries that we use!)

MangelMaxime commented 1 year ago

Hello @olo-ntaylor

First of all thank you for you kind words and always happy to know that my work is used and benefit others :).

If one wanted to fork this repo the roadmap would be:

  1. Fork the repo

  2. Change the following line of code to use the System.Text.Json internal AST types.

    https://github.com/thoth-org/Thoth.Json.Net/blob/dde35518f9eda3c5d35b39dd97945892e4435018/src/Types.fs#L5

  3. Follow the compiler errors and fix them.

In theory

For Encode.fs, the changes will probably be focused on the manual encoders.

Example:

https://github.com/thoth-org/Thoth.Json.Net/blob/dde35518f9eda3c5d35b39dd97945892e4435018/src/Encode.fs#L23-L41

For Decode.fs, the changes will probably be focused on the helper module:

https://github.com/thoth-org/Thoth.Json.Net/blob/dde35518f9eda3c5d35b39dd97945892e4435018/src/Decode.fs#L11-L42

I don't think you would need to change the Auto API because that code use the manual decoder internally and the rest of the code is related to the reflection which should be the same.

  1. And what is really important is that you use the same tests file coming from Thoth.Json repo. And use an "adapter" to adapt the open statement and a few others minor stuff when needed like this repo does.

https://github.com/thoth-org/Thoth.Json.Net/blob/dde35518f9eda3c5d35b39dd97945892e4435018/build.fsx#L74-L107


FYI I still have the plan to work on the next major version of Thoth.Json and I am still debating how I want it to be done. I have 2 ideas:

  1. Use a IDecoderHelpers<'Value> interface that library implementation could implements. Meaning that Thoth.Json by default would be library agnostic and then the user would provide the implementation instance to be user like Fable, Newtonsoft, System.Text.Json.

Cons: Several package to use so it is not a total unification of the packages. Thoth.Json + (Thoth.Json.Fable | Thoth.Json.Newtonsoft | Thoth.Json.System.Text.Json | etc.)

Pros: Use battle tested JSON parser and benefit from their work on performance.

  1. Use a custom JSON parser allowing Thoth.Json to have the exact same code running everywhere on .NET, JavaScript, Rust, Dart, Python, etc.

Cons: Need to write a custom JSON parser, meaning it will probably be slower than the others.

Pros: Have the same JSON parser, so the exact same behaviour everywhere + only 1 package instead of several.

njlr commented 1 year ago

@MangelMaxime @njlr I have suddenly become very interested in this comment:

What are your thoughts on building a third implementation of Thoth backed by System.Text.Json?

We are using Thoth extensively in an API/back-end F# project, and it is working really well with us. However, we have some hot, high-load endpoints/code paths where performance is really critical, and one thing that has come up is that System.Text.Json outperforms Newtonsoft by a significant margin, almost 2x as fast with 1/2 the memory allocations, according to several benchmarking reports. We'd really like to keep the "parse, don't validate" approach, but it's hard to justify taking an unnecessary performance hit like that.

Hypothetically, if one was going to fork this repo to something like Thoth.System.Text.Json, what would be a high-level roadmap to follow? It looks like System.Text.Json exposes roughly the same kind of primitives that Thoth depends on (i.e., JsonDocument and JsonElement instead of JToken), so at first glance it seems like the conversion would be somewhat straightforward, just a matter of changing out the JSON primitives we're using. Are there any gotchas you are aware of? (I'm sure there would be some differences in the APIs exposed by each, of course.)

If it's really a mostly straightforward transition, we could consider pitching in and help with that fork. On the other hand, if it would be a virtual re-write and take months of work to complete, that might sway us in a different direction.

(Btw, @MangelMaxime really appreciate all your work for the F# ecosystem. I've learned a ton from your work and benefited from your work on several F# libraries that we use!)

The main issue (which you may not care about anyway) is building libraries on top of the Thoth packages.

For example, the Thoth.Json.CE package only works with Thoth.Json. If you use Thoth.Json.Net then you must use Thoth.Json.Net.CE. The user is expected to write lots of #ifs.

If we added a third implementation, Thoth.Json.System then we also need to create Thoth.Json.System.CE.

The issue compounds as the dependency tree gets deeper!

I would suggest making an interface so that you can switch at run-time (IJsonOps?). Then an implementation of this interface can be provided by each package (Thoth.Json, Thoth.Json.Net, etc... ). The ecosystem can then be built on top of the backend-agnostic IJsonOps.

I do not believe that static configuration (using SRTP?) is possible but would love to be proven wrong :)

This would be a big change to Thoth.Json so would need buy-in from @MangelMaxime - I wouldn't want to fragment the ecosystem by making yet another JSON library!

If you want something that works today, Fleece has a System.Text.Json backend (but none for Fable).


Edit: Beat me to it!

I strongly support proposal 1.

A from-scratch, pure F# parser that works everywhere could be provided as a backend, giving of the benefits of 1 and 2.

MangelMaxime commented 1 year ago

@njlr You are right that libraries build on top of Thoth.Json would have more compiler directives in the code.

I strongly support proposal 1.

A from-scratch, pure F# parser that works everywhere could be provided as a backend, giving of the benefits of 1 and 2.

It is true that, option 1 is not exclusive of option 2. When option 2 is indeed exclusive.

Option 2, was at the time tried because using an interface means the helpers are not inline anymore. But in reality, I don't think I can write a JSON parser that is fast enough to beat Newtonsoft or System.Text.Json even if their helpers are not inlined.

olo-ntaylor commented 1 year ago

Gathering up what's been said so far... it sounds like the preference would be to not introduce a separate Thoth.System.Text.Json package at the moment, but to refactor some kind of Thoth.Core library that would have many potential "implementation-provider" packages to choose from.

At first glance, that seems like a ton more work 😅 Not sure if my intuition is correct on that... how long do you expect such a refactor would take, (assuming you had at least some community contributions)?

MangelMaxime commented 1 year ago

Gathering up what's been said so far... it sounds like the preference would be to not introduce a separate Thoth.System.Text.Json package at the moment, but to refactor some kind of Thoth.Core library that would have many potential "implementation-provider" packages to choose from.

Creating your own fork means:

Having Thoth.Json support several backend natively is a more viable solution over time IHMO.

At first glance, that seems like a ton more work 😅 Not sure if my intuition is correct on that... how long do you expect such a refactor would take, (assuming you had at least some community contributions)?

Hum such a work does indeed require some work. Most of the code already exist but I need change the types from:

type Decoder<'T> = string -> JsonValue -> Result<'T, DecoderError>

type Encoder<'T> = 'T -> JsonValue

to something like

type Decoder<'T> = IDecoderHelpers<'Value> ->  string -> JsonValue -> Result<'T, DecoderError>

type Encoder<'T> = IEncoderHelpers<'Value> -> 'T -> JsonValue

The other big change is to merge Thoth.Json.Net and Thoth.Json code into a single package to have the logic in one package only.

I can't really gave an estimate but my last complete rewrite of Thoth.Json took me 1 week to have a beta version. I didn't release it because I was not happy with a few things and also I made some improvement on how some types where represented but didn't track it correctly and so was not able to ensure compatibility with previous version. I now have learn from this error 😅

The initial bulk work would need to be done by me otherwise I don't think I could understand the implications of the changes. After that, community contributions to report errors / send PR would be welcome indeed.

@olo-ntaylor I can't give an estimate of when I will be able to work on that project. But in the past, I had some companies sponsoring some of my work time so I could focus on working on their specifics needs.

If you would like to discuss a possibility like that, I am available on F# Slack by the handle @MangelMaxime.

olo-ntaylor commented 1 year ago

Thanks for the quick responses, @MangelMaxime!!

I think one approach we may consider is forking the repo into our private GH and doing the swap to System.Text.Json there so we don't pollute the ecosystem of packages available, and that way we're not going to screw anyone else over if we miss something 😅 Then we could potentially re-evaluate our needs and how urgent it would be to move to the updated framework-agnostic version in the future.

MangelMaxime commented 1 year ago

Seems like a good middle ground to me.

njlr commented 1 year ago

I was curious what Option 1 might look like so I sketched this out.

It works in an F# script.

// Core definitions, agnostic to the particular JSON backend

type IEncoderOps<'json> =
  abstract member Encode : int -> 'json
  abstract member Encode : 'json list -> 'json

type IDecoderOps<'json> =
  abstract member TryAsInt : 'json -> int option
  abstract member TryAsArray : 'json -> ('json list) option

type IEncoder<'t> =
  abstract member Encode<'json> : ops : IEncoderOps<'json> * value : 't -> 'json

type IDecoder<'t> =
  abstract member Decode<'json> : ops : IDecoderOps<'json> * path : string * json : 'json -> Result<'t, string>

[<RequireQualifiedAccess>]
module Encode =

  let int : IEncoder<int> =
    {
      new IEncoder<int> with
        member this.Encode<'json>(ops : IEncoderOps<'json>, t : int) =
          ops.Encode(t)
    }

  let list (element : IEncoder<'t>) : IEncoder<'t list> =
    {
      new IEncoder<'t list> with
        member this.Encode<'json>(ops : IEncoderOps<'json>, xs : 't list) =
          xs
          |> List.map (fun x -> element.Encode(ops, x))
          |> ops.Encode
    }

[<RequireQualifiedAccess>]
module Decode =

  let int : IDecoder<int> =
    {
      new IDecoder<int> with
        member this.Decode<'json>(ops : IDecoderOps<'json>, path : string, json : 'json) =
          match ops.TryAsInt(json) with
          | Some i -> Ok i
          | None -> Error $"%s{path}: Expected an int, but found %A{json}"
    }

  let list (element : IDecoder<'t>) : IDecoder<'t list> =
    {
      new IDecoder<'t list> with
        member this.Decode<'json>(ops : IDecoderOps<'json>, path : string, json : 'json) =
          match ops.TryAsArray(json) with
          | Some xs ->
            let rec loop (index : int) (xs : 'json list) (acc : 't list) =
              match xs with
              | [] -> Ok (List.rev acc)
              | x :: xs ->
                match element.Decode(ops, $"%s{path}[%i{index}]", x) with
                | Ok x -> loop (index + 1) xs (x :: acc)
                | Error error -> Error error
            loop 0 xs []
          | None -> Error $"%s{path}: Expected an array, but found %A{json}"
    }
// Backend for System.Text.Json

module SystemTextJson =

  open System.Text.Json

  type private EncoderOps() =
    interface IEncoderOps<JsonElement> with
      member this.Encode(i : int) =
        JsonSerializer.SerializeToElement(i)

      member this.Encode(xs : JsonElement list) =
        JsonSerializer.SerializeToElement(xs)

  type private DecoderOps() =
    interface IDecoderOps<JsonElement> with
      member this.TryAsInt(element : JsonElement) =
        match element.ValueKind with
        | JsonValueKind.Number ->
          Some (element.GetInt32())
        | _ ->
          None

      member this.TryAsArray(element : JsonElement) =
        match element.ValueKind with
        | JsonValueKind.Array ->
          Some
            [
              let mutable en = element.EnumerateArray()

              while en.MoveNext() do
                yield en.Current
            ]
        | _ ->
          None

  [<RequireQualifiedAccess>]
  module Encode =

    let private ops =
      EncoderOps()
      :> IEncoderOps<JsonElement>

    let toString (encoder : IEncoder<'t>) (t : 't) : string =
      encoder.Encode(ops, t).ToString()

  [<RequireQualifiedAccess>]
  module Decode =

    let private ops =
      DecoderOps()
      :> IDecoderOps<JsonElement>

    let fromString (decoder : IDecoder<'t>) (json : string) : Result<'t, string> =
      try
        let jsonDoc = JsonDocument.Parse(json)
        decoder.Decode(ops, "$", jsonDoc.RootElement)
      with
      | :? JsonException as exn ->
        Error exn.Message
// Demo

open SystemTextJson

let encoder =
  Encode.list Encode.int

let decoder =
  Decode.list Decode.int

let xs = [ 1; 2; 3 ]

printfn $"%A{xs}"

let json = Encode.toString encoder xs

printfn $"%s{json}"

match Decode.fromString decoder json with
| Ok xs ->
  printfn "%A" xs
| Error err ->
  printfn "Error: %s" err

Some findings:

I would be happy to contribute if / when the next version is started.

MangelMaxime commented 1 year ago

Glad to see that the concept works in theory.

You did it a bit differently that I would have done it. Indeed, I would have kept the let list be a function all the way so something like:

    let list (decoder : Decoder<'value>) : Decoder<'value list> =
        fun decoderHelpers path value -> // ...

To me it seems simpler than using object expression.

MangelMaxime commented 1 year ago

Also, I re-created a new issue in Thoth.Json repository to track this new re-design work.

I will add to that issue, the list of all the improvements I want to include because if we are to do a major rewrite it is the good time to improve other stuff too.

https://github.com/thoth-org/Thoth.Json/issues/175

njlr commented 1 year ago

Glad to see that the concept works in theory.

You did it a bit differently that I would have done it. Indeed, I would have kept the let list be a function all the way so something like:

    let list (decoder : Decoder<'value>) : Decoder<'value list> =
        fun decoderHelpers path value -> // ...

To me it seems simpler than using object expression.

Maybe I'm missing something, but I'm not sure this can work.

What is the type of decoderHelpers?

I think it needs an extra type-parameter to be filled by System.Text.Json.JsonElement, Newtonsoft.Json.Linq.JToken, etc.

MangelMaxime commented 1 year ago

The types would be something like that:

type IDecoderHelpers<'JsonValue> =
    abstract GetField : FieldName : string -> value : 'JsonValue -> 'JsonValue
    abstract IsString : value : 'JsonValue -> bool
    abstract IsBoolean : value : 'JsonValue -> bool
    abstract IsNumber : value : 'JsonValue -> bool
    abstract IsArray : value : 'JsonValue -> bool
    abstract IsObject : value : 'JsonValue -> bool
    abstract IsNullValue : value : 'JsonValue -> bool
    abstract IsIntegralValue : value : 'JsonValue -> bool
    abstract IsUndefined : value : 'JsonValue -> bool
    abstract AnyToString : value : 'JsonValue -> string

    abstract ObjectKeys : value : 'JsonValue -> string seq
    abstract AsBool : value : 'JsonValue -> bool
    abstract AsInt : value : 'JsonValue -> int
    abstract AsFloat : value : 'JsonValue -> float
    abstract AsFloat32 : value : 'JsonValue -> float32
    abstract AsString : value : 'JsonValue -> string
    abstract AsArray : value : 'JsonValue -> 'JsonValue[]

I never tested my idea of using that interface so it is possible that indeed the compiler will want something more that just a function for the decoder/encoder definition.

olo-ntaylor commented 1 year ago

@MangelMaxime @njlr Just FWIW, in my hacking yesterday I realized that System.Text.Json actually exposes two different models for JSON ASTs. There is the JsonDocument/JsonElement model designed for efficient reading (it's immutable), and the (mutable) JsonNode/JsonValue model designed for efficient creation of a JSON DOM.

I only mention that because I think you'd have to keep in mind that 'JsonValue might be different for the Decode and Encode modules depending on the underlying parser implementation.

MangelMaxime commented 7 months ago

Hello @njlr,

I started working on the unification of Thoth.Json and you were right. Using object expression allows to avoid surfacing the 'JsonValue in the user code.

It allows to write:

type User =
    {
        FirstName: string
        LastName: string
    }

module User =

    let decoder: Decoder<User> =
        Decode.map2
            (fun firstName lastName ->
                {
                    FirstName = firstName
                    LastName = lastName
                }
            )
            (Decode.field "firstName" Decode.string)
            (Decode.field "lastName" Decode.string)

instead of something like let decoder: Decoder<'JsonValue, User> = (pseudo code as I don't have anymore this code in the repo ^^).

And I am happy to report that the experimentation seems to be really promising.

Implementing a new runtime/target is as simple as implementing IDecoderHelpers<'JsonValue> contract and exposing Decode.fromString.

let encoderHelpers =
    { new IDecoderHelpers<obj> with
        member _.isString jsonValue = jsonValue :? string

       // ...

        member _.anyToString jsonValue =
            emitJsStatement jsonValue
                """
JSON.stringify($0, null, 4) + ''
                """
    }

module Decode =

    module Helpers =

        [<Emit("$0 instanceof SyntaxError")>]
        let isSyntaxError (_ : obj) : bool = jsNative

    let fromString (decoder : Decoder<'T>) =
        fun value ->
            try
               let json = JS.JSON.parse value
               Decode.fromValue encoderHelpers "$" decoder json
            with
                | ex when Helpers.isSyntaxError ex ->
                    Error("Given an invalid JSON: " + ex.Message)

I have a working minimal prototype which works for JavaScript and Newtonsoft.Json. I am now working on transposing all the manual API to the new implementation in order to checks if it works on a much larger scales.

@olo-ntaylor Thank you for the note. Once, I have completed the work of JavaScript and Newtonsoft.Json, I will give it a try to use System.Text.Json in order to make sure that the point you raised is supported.

Progress on the rewrite will be reported in https://github.com/thoth-org/Thoth.Json/issues/175