jdevillard / JmesPath.Net

A fully compliant implementation of JMESPATH for .NetCore
Apache License 2.0
126 stars 22 forks source link

System.Text.Json support #49

Closed maximiliancsuk closed 2 years ago

maximiliancsuk commented 2 years ago

Hello,

has someone started on supporting System.Text.Json instead of/in addition to Newtonsoft.json? If not, I might give it a go myself.

I've seen https://github.com/jdevillard/JmesPath.Net/issues/24 where @atifaziz has already laid the groundwork by separating out the parser from the rest.

springcomp commented 2 years ago

@maximiliancsuk Yes, I have.

I have started a proof of concept that only supports parsing identifiers and it seems possible to complete the work in a relatively straightforward manner.

Unfortunately, I have no plans to commit to work on this at the moment.

In fact, I’m more concerned with the performance of the core parser itself. I started to use System.Text.Json as a proof of concept to see whether that would lead to better performances, but it appears that using a compiler generator is the source of bad performance in JmesPath.NET.

I have reworked the lexer part and had quite some small performance benefits. The real performance benefits would come when rewriting the core parser to a top-down precedence parser instead. However, this would most certainly break the interface contract between the parser and the implementation libraries in IJmesPathGenerator.

We are absolutely accepting contributions to supporting System.Text.Json though... 😁

atifaziz commented 2 years ago

…rewriting the core parser to a top-down precedence parser instead.

@springcomp In case it can help, I published a generic top-down precedence parser implementation for .NET Standard.

springcomp commented 2 years ago

@atifaziz Awesome. This will definitely help me.

springcomp commented 2 years ago

@atifaziz I’m starting work on refactoring the parser using Gratt. I have taken some time familiarizing myself with the concept of Vaughn Pratt parsers ; with your library and found the Bantam toy-language tests very valuable.

I’m now at the point to apply this knowledge to making the changes in JMESPath.NET.

It seems that I will break the current architecture that you help contribute to by splitting the core parsing code from the code that is specific to Newtonsof.Json, in order to promote swapping JSON implementations.

The problem I seem to have is that Gratt parsers should return an AST. But AST nodes in JMESPath.NET are strongly coupled to Newtonsoft.Json’s JToken class. I’m thinking about having a JmesPathExpression<TJsonToken> class but I think this will make is so much more complicated in the end.

I would be happy to have your suggestions or to plan a quick chat if that would be OK for you.

atifaziz commented 2 years ago

The problem I seem to have is that Gratt parsers should return an AST.

Gratt doesn't mandate an AST. The result (TResult) is entirely generic so you can decide on what it is. For example, while Bantam parser indeed returns an AST, the C# pre-processor expression parser example returns bool because it evaluates the expression as it parses the input.

springcomp commented 2 years ago

The problem I seem to have is that Gratt parsers should return an AST.

Gratt doesn't mandate an AST. The result (TResult) is entirely generic and so you can decide on what it is. For example, while Bantam parser indeed returns an AST, the C# pre-processor expression parser example returns bool because it evaluates the expression as the parses the input.

Hi @atifaziz I think I’m making good progress. Thanks to your excellent library. I am able to take advantage of the IJmesPathGenerator abstraction you contributed and keep the interface contract untouched. That is awesome!

Is there any possibility to support adding tokens in in the token stream dynamically ? I would need to cover this corner case

atifaziz commented 2 years ago

@springcomp Hey, I noticed you posted a question on the Gratt repo. Is it related to your corner case about needing to inject tokens into the source stream or is it something separate?


PS I tried to check out your progress, but it was showing a 404 last week. It seems the branch had either not been pushed or was deleted.

springcomp commented 2 years ago

@springcomp Hey, I noticed you posted a question on the Gratt repo. Is it related to your corner case about needing to inject tokens into the source stream or is it something separate?

PS I tried to check out your progress, but it was showing a 404 last week. It seems the branch had either not been pushed or was deleted.

Thanks for your feedback. The corner case is actually unrelated. I frequently rebase and force push while working on a private branch. Please, find the latest progress here.

atifaziz commented 2 years ago

Is there any possibility to support adding tokens in in the token stream dynamically ?

Not directly. Gratt's Parser.Parse expects a token sequence (IEnumerable<(TKind, TToken)>), but you could play around with the idea of a mutable token source. Following is something I put together quickly to give an idea of what I mean by this.

Suppose an IEnumerator<T> implementation that allows injecting new items to the head/front or tail/back of an inner enumerator:

sealed class VariableEnumerator<T> : IEnumerator<T>
{
    Queue<T>? _front = new();
    IEnumerator<T>? _inner;
    Queue<T>? _back = new();
    (bool, T) _current;
    bool _ended;

    public VariableEnumerator(IEnumerator<T> inner) => _inner = inner;

    public T Current => _current is (true, var current) ? current : throw new InvalidOperationException();

    object? IEnumerator.Current => Current;

    public void Dispose() => _inner?.Dispose();

    public bool MoveNext()
    {
        if (_front is { Count: > 0 } front)
        {
            _current = (true, front.Dequeue());
            return true;
        }

        if (_inner is { } inner)
        {
            if (inner.MoveNext())
            {
                _current = (true, inner.Current);
                return true;
            }

            inner.Dispose();
            _inner = null;
        }

        if (_back is { Count: > 0 } back)
        {
            _current = (true, back.Dequeue());
            return true;
        }

        _ended = true;
        _current = default;
        _front = _back = null;
        return false;        
    }

    public void PushBack(T item) => Push(ref _back, item);
    public void PushFront(T item) => Push(ref _front, item);

    void Push(ref Queue<T>? queue, T item)
    {
        if (_ended)
            throw new InvalidOperationException();

        (queue ??= new Queue<T>()).Enqueue(item);
    }

    public void Reset() => throw new NotSupportedException();
}

Suppose further that there's an extension method that allows a variable enumerator to be created from a sequence:

static partial class Extensions
{
    public static VariableEnumerator<T> GetVariableEnumerator<T>(this IEnumerable<T> source) =>
        new(source.GetEnumerator());
}

You could put this variable enumerator into the state passed to the parser such that you have access to it during parsing and are able to modify it (to inject tokens). However, since Gratt's Parser.Parse needs an IEnumerable<>, we'll need to convert the enumerator back to enumerable. We can have another extension for that:

static partial class Extensions
{
    public static IEnumerable<T> Share<T>(this IEnumerator<T> source)
    {
        while (source.MoveNext())
            yield return source.Current;
    }
}

The method is called Share because it provides a shared and non-repeatable view over the source.

With these generic solutions, you can then hack together a way to inject token during iteration:

var tokens = new[] { "b", "c", "e" }.GetVariableEnumerator();
tokens.PushBack("f");
tokens.PushFront("a");
foreach (var token in tokens.Share())
{
    if (token is "c")
    {
        tokens.PushFront("d");
        tokens.PushBack("g");
    }
    Console.WriteLine(token);
}

// outputs:
//   a
//   b
//   c
//   d
//   e
//   f
//   g

Hope this helps.

gregsdennis commented 2 years ago

If you're still looking for a STJ-supporting JMES Path library, check out https://github.com/danielaparker/JsonCons.Net.

springcomp commented 2 years ago

If you're still looking for a STJ-supporting JMES Path library, check out https://github.com/danielaparker/JsonCons.Net.

Hi @gregsdennis thanks for pointing this out. I was not aware of this project.

This is a great alternative to JMESPath.Net for people who must rely on System.Text.Json. That library has the same level of performance than ours.

It seems to be a little forgiving and returns null where an invalid-type error is expected in somes cases. Apart from that, seems a great alternative.