Open njlr opened 1 year ago
Thoth.Json.Net is not directly responsible for parsing the JSON. It is using Newtonsoft under the hood.
It seems like Newtonsoft is permissive by default...
https://github.com/JamesNK/Newtonsoft.Json/issues/818
Unfortunately, there isn't much we can do on Thoth.Json.Net side about it right now.
System.Text.Json
does not appear to suffer from the same issue:
open System.Text.Json
let json =
"""
{
"x": 123,
"y": 456,
}
"""
let x = JsonDocument.Parse json
printfn $"{x}"
System.Text.Json.JsonReaderException: The JSON object contains a trailing comma at the end which is not supported in this mode. Change the reader options. LineNumber: 4 | BytePositionInLine: 2.
What are your thoughts on building a third implementation of Thoth backed by System.Text.Json
? Thoth.Json.System
?
Happy to contribute toward this.
What are your thoughts on building a third implementation of Thoth backed by System.Text.Json? Thoth.Json.System?
The plan right now, is more to stop having 2 implementations of the library and use a single/unified package. The reason behind this choice is to reduce the number of packages to maintain and also to allow people to easily create libraries/package depending on Thoth.Json.
Right now, if they want there package to be compatible with .NET and Fable they needed to depends on both Thoth.Json and Thoth.Json.Net and use compiler directives in there code. It would be better if they didn't have to do that.
I am still unsure if I want to hard code the implementation or create an interface which contains more of less the Helpers
module in it.
Using the interface means losing a bit of performance but it was never a priority with Thoth.Json.
Solution could also be a mix:
But I feel like the mix is going harder to create and perhaps easy to break.
If I am using a custom JSON parser I can unify almost all the code and only worry about .NET / Fable difference in the reflection API. However, creating a JSON parser is not that simple as if we want to be compliant to the spec it is almost impossible because the spec itself doesn't specify everything...
And after that people say that JSON is simple :p
My message got a bit side tracked but I hope it answer a bit your question. Feel free to comment if you want to speak more about it
@MangelMaxime @njlr I have suddenly become very interested in this comment:
What are your thoughts on building a third implementation of Thoth backed by System.Text.Json?
We are using Thoth extensively in an API/back-end F# project, and it is working really well with us. However, we have some hot, high-load endpoints/code paths where performance is really critical, and one thing that has come up is that System.Text.Json
outperforms Newtonsoft
by a significant margin, almost 2x as fast with 1/2 the memory allocations, according to several benchmarking reports. We'd really like to keep the "parse, don't validate" approach, but it's hard to justify taking an unnecessary performance hit like that.
Hypothetically, if one was going to fork this repo to something like Thoth.System.Text.Json
, what would be a high-level roadmap to follow? It looks like System.Text.Json
exposes roughly the same kind of primitives that Thoth depends on (i.e., JsonDocument
and JsonElement
instead of JToken
), so at first glance it seems like the conversion would be somewhat straightforward, just a matter of changing out the JSON primitives we're using. Are there any gotchas you are aware of? (I'm sure there would be some differences in the APIs exposed by each, of course.)
If it's really a mostly straightforward transition, we could consider pitching in and help with that fork. On the other hand, if it would be a virtual re-write and take months of work to complete, that might sway us in a different direction.
(Btw, @MangelMaxime really appreciate all your work for the F# ecosystem. I've learned a ton from your work and benefited from your work on several F# libraries that we use!)
Hello @olo-ntaylor
First of all thank you for you kind words and always happy to know that my work is used and benefit others :).
If one wanted to fork this repo the roadmap would be:
Fork the repo
Change the following line of code to use the System.Text.Json
internal AST types.
Follow the compiler errors and fix them.
In theory
For Encode.fs
, the changes will probably be focused on the manual encoders.
Example:
For Decode.fs
, the changes will probably be focused on the helper module:
I don't think you would need to change the Auto API because that code use the manual decoder internally and the rest of the code is related to the reflection which should be the same.
FYI I still have the plan to work on the next major version of Thoth.Json and I am still debating how I want it to be done. I have 2 ideas:
IDecoderHelpers<'Value>
interface that library implementation could implements. Meaning that Thoth.Json by default would be library agnostic and then the user would provide the implementation instance to be user like Fable
, Newtonsoft
, System.Text.Json
.Cons: Several package to use so it is not a total unification of the packages. Thoth.Json + (Thoth.Json.Fable | Thoth.Json.Newtonsoft | Thoth.Json.System.Text.Json | etc.)
Pros: Use battle tested JSON parser and benefit from their work on performance.
Cons: Need to write a custom JSON parser, meaning it will probably be slower than the others.
Pros: Have the same JSON parser, so the exact same behaviour everywhere + only 1 package instead of several.
@MangelMaxime @njlr I have suddenly become very interested in this comment:
What are your thoughts on building a third implementation of Thoth backed by System.Text.Json?
We are using Thoth extensively in an API/back-end F# project, and it is working really well with us. However, we have some hot, high-load endpoints/code paths where performance is really critical, and one thing that has come up is that
System.Text.Json
outperformsNewtonsoft
by a significant margin, almost 2x as fast with 1/2 the memory allocations, according to several benchmarking reports. We'd really like to keep the "parse, don't validate" approach, but it's hard to justify taking an unnecessary performance hit like that.Hypothetically, if one was going to fork this repo to something like
Thoth.System.Text.Json
, what would be a high-level roadmap to follow? It looks likeSystem.Text.Json
exposes roughly the same kind of primitives that Thoth depends on (i.e.,JsonDocument
andJsonElement
instead ofJToken
), so at first glance it seems like the conversion would be somewhat straightforward, just a matter of changing out the JSON primitives we're using. Are there any gotchas you are aware of? (I'm sure there would be some differences in the APIs exposed by each, of course.)If it's really a mostly straightforward transition, we could consider pitching in and help with that fork. On the other hand, if it would be a virtual re-write and take months of work to complete, that might sway us in a different direction.
(Btw, @MangelMaxime really appreciate all your work for the F# ecosystem. I've learned a ton from your work and benefited from your work on several F# libraries that we use!)
The main issue (which you may not care about anyway) is building libraries on top of the Thoth packages.
For example, the Thoth.Json.CE
package only works with Thoth.Json
. If you use Thoth.Json.Net
then you must use Thoth.Json.Net.CE
. The user is expected to write lots of #if
s.
If we added a third implementation, Thoth.Json.System
then we also need to create Thoth.Json.System.CE
.
The issue compounds as the dependency tree gets deeper!
I would suggest making an interface so that you can switch at run-time (IJsonOps
?). Then an implementation of this interface can be provided by each package (Thoth.Json
, Thoth.Json.Net
, etc... ). The ecosystem can then be built on top of the backend-agnostic IJsonOps
.
I do not believe that static configuration (using SRTP?) is possible but would love to be proven wrong :)
This would be a big change to Thoth.Json
so would need buy-in from @MangelMaxime - I wouldn't want to fragment the ecosystem by making yet another JSON library!
If you want something that works today, Fleece has a System.Text.Json backend (but none for Fable).
Edit: Beat me to it!
I strongly support proposal 1.
A from-scratch, pure F# parser that works everywhere could be provided as a backend, giving of the benefits of 1 and 2.
@njlr You are right that libraries build on top of Thoth.Json would have more compiler directives in the code.
I strongly support proposal 1.
A from-scratch, pure F# parser that works everywhere could be provided as a backend, giving of the benefits of 1 and 2.
It is true that, option 1 is not exclusive of option 2. When option 2 is indeed exclusive.
Option 2, was at the time tried because using an interface means the helpers are not inline
anymore. But in reality, I don't think I can write a JSON parser that is fast enough to beat Newtonsoft or System.Text.Json even if their helpers are not inlined.
Gathering up what's been said so far... it sounds like the preference would be to not introduce a separate Thoth.System.Text.Json
package at the moment, but to refactor some kind of Thoth.Core
library that would have many potential "implementation-provider" packages to choose from.
At first glance, that seems like a ton more work 😅 Not sure if my intuition is correct on that... how long do you expect such a refactor would take, (assuming you had at least some community contributions)?
Gathering up what's been said so far... it sounds like the preference would be to not introduce a separate
Thoth.System.Text.Json
package at the moment, but to refactor some kind ofThoth.Core
library that would have many potential "implementation-provider" packages to choose from.
Creating your own fork means:
Having Thoth.Json support several backend natively is a more viable solution over time IHMO.
At first glance, that seems like a ton more work 😅 Not sure if my intuition is correct on that... how long do you expect such a refactor would take, (assuming you had at least some community contributions)?
Hum such a work does indeed require some work. Most of the code already exist but I need change the types from:
type Decoder<'T> = string -> JsonValue -> Result<'T, DecoderError>
type Encoder<'T> = 'T -> JsonValue
to something like
type Decoder<'T> = IDecoderHelpers<'Value> -> string -> JsonValue -> Result<'T, DecoderError>
type Encoder<'T> = IEncoderHelpers<'Value> -> 'T -> JsonValue
The other big change is to merge Thoth.Json.Net and Thoth.Json code into a single package to have the logic in one package only.
I can't really gave an estimate but my last complete rewrite of Thoth.Json took me 1 week to have a beta version. I didn't release it because I was not happy with a few things and also I made some improvement on how some types where represented but didn't track it correctly and so was not able to ensure compatibility with previous version. I now have learn from this error 😅
The initial bulk work would need to be done by me otherwise I don't think I could understand the implications of the changes. After that, community contributions to report errors / send PR would be welcome indeed.
@olo-ntaylor I can't give an estimate of when I will be able to work on that project. But in the past, I had some companies sponsoring some of my work time so I could focus on working on their specifics needs.
If you would like to discuss a possibility like that, I am available on F# Slack by the handle @MangelMaxime.
Thanks for the quick responses, @MangelMaxime!!
I think one approach we may consider is forking the repo into our private GH and doing the swap to System.Text.Json there so we don't pollute the ecosystem of packages available, and that way we're not going to screw anyone else over if we miss something 😅 Then we could potentially re-evaluate our needs and how urgent it would be to move to the updated framework-agnostic version in the future.
Seems like a good middle ground to me.
I was curious what Option 1 might look like so I sketched this out.
It works in an F# script.
// Core definitions, agnostic to the particular JSON backend
type IEncoderOps<'json> =
abstract member Encode : int -> 'json
abstract member Encode : 'json list -> 'json
type IDecoderOps<'json> =
abstract member TryAsInt : 'json -> int option
abstract member TryAsArray : 'json -> ('json list) option
type IEncoder<'t> =
abstract member Encode<'json> : ops : IEncoderOps<'json> * value : 't -> 'json
type IDecoder<'t> =
abstract member Decode<'json> : ops : IDecoderOps<'json> * path : string * json : 'json -> Result<'t, string>
[<RequireQualifiedAccess>]
module Encode =
let int : IEncoder<int> =
{
new IEncoder<int> with
member this.Encode<'json>(ops : IEncoderOps<'json>, t : int) =
ops.Encode(t)
}
let list (element : IEncoder<'t>) : IEncoder<'t list> =
{
new IEncoder<'t list> with
member this.Encode<'json>(ops : IEncoderOps<'json>, xs : 't list) =
xs
|> List.map (fun x -> element.Encode(ops, x))
|> ops.Encode
}
[<RequireQualifiedAccess>]
module Decode =
let int : IDecoder<int> =
{
new IDecoder<int> with
member this.Decode<'json>(ops : IDecoderOps<'json>, path : string, json : 'json) =
match ops.TryAsInt(json) with
| Some i -> Ok i
| None -> Error $"%s{path}: Expected an int, but found %A{json}"
}
let list (element : IDecoder<'t>) : IDecoder<'t list> =
{
new IDecoder<'t list> with
member this.Decode<'json>(ops : IDecoderOps<'json>, path : string, json : 'json) =
match ops.TryAsArray(json) with
| Some xs ->
let rec loop (index : int) (xs : 'json list) (acc : 't list) =
match xs with
| [] -> Ok (List.rev acc)
| x :: xs ->
match element.Decode(ops, $"%s{path}[%i{index}]", x) with
| Ok x -> loop (index + 1) xs (x :: acc)
| Error error -> Error error
loop 0 xs []
| None -> Error $"%s{path}: Expected an array, but found %A{json}"
}
// Backend for System.Text.Json
module SystemTextJson =
open System.Text.Json
type private EncoderOps() =
interface IEncoderOps<JsonElement> with
member this.Encode(i : int) =
JsonSerializer.SerializeToElement(i)
member this.Encode(xs : JsonElement list) =
JsonSerializer.SerializeToElement(xs)
type private DecoderOps() =
interface IDecoderOps<JsonElement> with
member this.TryAsInt(element : JsonElement) =
match element.ValueKind with
| JsonValueKind.Number ->
Some (element.GetInt32())
| _ ->
None
member this.TryAsArray(element : JsonElement) =
match element.ValueKind with
| JsonValueKind.Array ->
Some
[
let mutable en = element.EnumerateArray()
while en.MoveNext() do
yield en.Current
]
| _ ->
None
[<RequireQualifiedAccess>]
module Encode =
let private ops =
EncoderOps()
:> IEncoderOps<JsonElement>
let toString (encoder : IEncoder<'t>) (t : 't) : string =
encoder.Encode(ops, t).ToString()
[<RequireQualifiedAccess>]
module Decode =
let private ops =
DecoderOps()
:> IDecoderOps<JsonElement>
let fromString (decoder : IDecoder<'t>) (json : string) : Result<'t, string> =
try
let jsonDoc = JsonDocument.Parse(json)
decoder.Decode(ops, "$", jsonDoc.RootElement)
with
| :? JsonException as exn ->
Error exn.Message
// Demo
open SystemTextJson
let encoder =
Encode.list Encode.int
let decoder =
Decode.list Decode.int
let xs = [ 1; 2; 3 ]
printfn $"%A{xs}"
let json = Encode.toString encoder xs
printfn $"%s{json}"
match Decode.fromString decoder json with
| Ok xs ->
printfn "%A" xs
| Error err ->
printfn "Error: %s" err
Some findings:
Encode.list
function has a different API, since we don't know the type of JSON values when it is usedEncode.toString
must take the encoder and the valueI would be happy to contribute if / when the next version is started.
Glad to see that the concept works in theory.
You did it a bit differently that I would have done it. Indeed, I would have kept the let list
be a function all the way so something like:
let list (decoder : Decoder<'value>) : Decoder<'value list> =
fun decoderHelpers path value -> // ...
To me it seems simpler than using object expression.
Also, I re-created a new issue in Thoth.Json repository to track this new re-design work.
I will add to that issue, the list of all the improvements I want to include because if we are to do a major rewrite it is the good time to improve other stuff too.
Glad to see that the concept works in theory.
You did it a bit differently that I would have done it. Indeed, I would have kept the
let list
be a function all the way so something like:let list (decoder : Decoder<'value>) : Decoder<'value list> = fun decoderHelpers path value -> // ...
To me it seems simpler than using object expression.
Maybe I'm missing something, but I'm not sure this can work.
What is the type of decoderHelpers
?
I think it needs an extra type-parameter to be filled by System.Text.Json.JsonElement
, Newtonsoft.Json.Linq.JToken
, etc.
The types would be something like that:
type IDecoderHelpers<'JsonValue> =
abstract GetField : FieldName : string -> value : 'JsonValue -> 'JsonValue
abstract IsString : value : 'JsonValue -> bool
abstract IsBoolean : value : 'JsonValue -> bool
abstract IsNumber : value : 'JsonValue -> bool
abstract IsArray : value : 'JsonValue -> bool
abstract IsObject : value : 'JsonValue -> bool
abstract IsNullValue : value : 'JsonValue -> bool
abstract IsIntegralValue : value : 'JsonValue -> bool
abstract IsUndefined : value : 'JsonValue -> bool
abstract AnyToString : value : 'JsonValue -> string
abstract ObjectKeys : value : 'JsonValue -> string seq
abstract AsBool : value : 'JsonValue -> bool
abstract AsInt : value : 'JsonValue -> int
abstract AsFloat : value : 'JsonValue -> float
abstract AsFloat32 : value : 'JsonValue -> float32
abstract AsString : value : 'JsonValue -> string
abstract AsArray : value : 'JsonValue -> 'JsonValue[]
I never tested my idea of using that interface so it is possible that indeed the compiler will want something more that just a function for the decoder/encoder definition.
@MangelMaxime @njlr Just FWIW, in my hacking yesterday I realized that System.Text.Json
actually exposes two different models for JSON ASTs. There is the JsonDocument
/JsonElement
model designed for efficient reading (it's immutable), and the (mutable) JsonNode
/JsonValue
model designed for efficient creation of a JSON DOM.
I only mention that because I think you'd have to keep in mind that 'JsonValue
might be different for the Decode and Encode modules depending on the underlying parser implementation.
Hello @njlr,
I started working on the unification of Thoth.Json and you were right. Using object expression allows to avoid surfacing the 'JsonValue
in the user code.
It allows to write:
type User =
{
FirstName: string
LastName: string
}
module User =
let decoder: Decoder<User> =
Decode.map2
(fun firstName lastName ->
{
FirstName = firstName
LastName = lastName
}
)
(Decode.field "firstName" Decode.string)
(Decode.field "lastName" Decode.string)
instead of something like let decoder: Decoder<'JsonValue, User> =
(pseudo code as I don't have anymore this code in the repo ^^).
And I am happy to report that the experimentation seems to be really promising.
Implementing a new runtime/target is as simple as implementing IDecoderHelpers<'JsonValue>
contract and exposing Decode.fromString
.
let encoderHelpers =
{ new IDecoderHelpers<obj> with
member _.isString jsonValue = jsonValue :? string
// ...
member _.anyToString jsonValue =
emitJsStatement jsonValue
"""
JSON.stringify($0, null, 4) + ''
"""
}
module Decode =
module Helpers =
[<Emit("$0 instanceof SyntaxError")>]
let isSyntaxError (_ : obj) : bool = jsNative
let fromString (decoder : Decoder<'T>) =
fun value ->
try
let json = JS.JSON.parse value
Decode.fromValue encoderHelpers "$" decoder json
with
| ex when Helpers.isSyntaxError ex ->
Error("Given an invalid JSON: " + ex.Message)
I have a working minimal prototype which works for JavaScript and Newtonsoft.Json. I am now working on transposing all the manual API to the new implementation in order to checks if it works on a much larger scales.
@olo-ntaylor Thank you for the note. Once, I have completed the work of JavaScript and Newtonsoft.Json, I will give it a try to use System.Text.Json in order to make sure that the point you raised is supported.
Progress on the rewrite will be reported in https://github.com/thoth-org/Thoth.Json/issues/175
This should not parse since the trailing comma is out of spec.