quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.84k stars 2.7k forks source link

Add support for optional predicates in Panache #2303

Open emmanuelbernard opened 5 years ago

emmanuelbernard commented 5 years ago

What people use the example API for is for generic "search" screens that need to accept between 1 to n predicates depending on what the user as filled in. And the example API does allow to do such matching assuming the search fields are all in one entity as properties and by adding side metadata on what property needs to be ignored, ignored if null, have a trailing % in like etc.

The following Panache model would scale better. I call it the destructured example API

public List<Todo> search(String title, String owner, Integer order, Page page) {
    return find("title? like ?1% and owner? = ?2 and order? < ?3 order by order", title, owner, order)
            .page(page).list();
}

The trailing ? per property expresses that the predicate on the property should be ignored if the property is null.

The advantages I see to this approach are:

Thoughts @gavinking @Sanne @gsmet @FroMage ?

FroMage commented 5 years ago

I like this. This also lets us use or instead of and, right?

emmanuelbernard commented 5 years ago

I like this. This also lets us use or instead of and, right?

yes

Sanne commented 5 years ago

I do like how this looks, but I like the separation of predicates more. Separated predicates could still be checked too I believe? Just need to make sure that the root type is made explicit.

The weak aspects of this solution:

But don't take this as a blocker; just mentioning the limitations so that we're on the same page on why I prefer the separation of predicates.

emmanuelbernard commented 5 years ago

What's your alternative syntax or API look like for the examples I gave @Sanne ?

loicmathieu commented 5 years ago

I do think it's a good addition to Panache, but I don't think it removes the need to have a byExample API.

A byExample API can be used not only for the use case that are explained in the description, but as a full replacement for the textual query syntax by expressing a query programatively. This is IMHO the best advantage of a query byExample having the hablity to express a Query in a compile checked and type safe fashion.

When you test everything with mocks, your test didn't hit the database and the query are never rendered. Having a query that is mechanically generated from an object allows to be sure that this query will be semantically exact.

But I'm not 100% fan of some query by example implementation that allows a user to specify any type as long as long as the type have the same attributes as the Entity (this will not longuer by compile checked and type safe) . I can see some advantages (when you have DTO with not all the fields) but I prefere a query by example that uses the same type as the Entity directly.

Something like this could be added to Panache:

BookEntity example = new BookEntity();
example.author="Victor Hugo";
List<BookEntity> victorBooks= BookEntity.findByExample(example).list();

An other more complex solution should be to provide a Criteria API to provides pragramatically query generation so one could write something like :

Criteria criteria = Criteria.eq("author", "Victor Hugo")
        .and(Criteria.lte("rating", 10));
List<BookEntity> victorBooks= BookEntity.findByCriteria(criteria).list();

This will cover more use cases but add some more complexity in the implementation sides.

To conclude, I think that if we want to keep Panache simple, we should add a byExample API like the one I described to add a way to programmatically express a compiled checked and type safe query

emmanuelbernard commented 5 years ago

What if PanacheQL was compile time checked. It looks like it would alleviate all the concerns you have. CC @gavinking My concern with the example API is that you don't control which field is OK vs not OK and it becomes cumbersome to define what null or default values mean per property (ignore or set)?

loicmathieu commented 5 years ago

Compile checking PanacheQL queries will be a good thing, but it assume it will not really be compile check but build time check (by the deployment module).

So there will be no such check via my IDE ...

Anyway, I still support an additional querying facility (example or criteria or something else) than via a simple String, because some people will prefere it. I think that at some point we may add it to Panache, maybe it's not the time now :)

emmanuelbernard commented 5 years ago

No we do have annotation processor driven HQL validation we can repurpose for PanacheQL, it will show up in your IDE

emmanuelbernard commented 5 years ago

I reviewed this issue and my proposal is to implement this and see how much the other approaches are still sorely missed. This would need to go hand in hand with https://github.com/quarkusio/quarkus/issues/5552 to achieve the target objective.

I understand the limitations:

Note that falling back to plain JPA is really easy. I can even see us offering entity.entityManager();

CC @FroMage

FroMage commented 5 years ago

So, the proposal is to auto-remove AST nodes of the predicate type when their LHS or RHS expression is of type path with a question mark:

predicate
    : LEFT_PAREN predicate RIGHT_PAREN                      # GroupedPredicate
    | predicate OR predicate                                # OrPredicate
    | predicate AND predicate                               # AndPredicate
    | NOT predicate                                         # NegatedPredicate
    | expression IS (NOT)? NULL                             # IsNullPredicate
    | expression IS (NOT)? EMPTY                            # IsEmptyPredicate
    | expression EQUAL expression                           # EqualityPredicate
    | expression NOT_EQUAL expression                       # InequalityPredicate
    | expression GREATER expression                         # GreaterThanPredicate
    | expression GREATER_EQUAL expression                   # GreaterThanOrEqualPredicate
    | expression LESS expression                            # LessThanPredicate
    | expression LESS_EQUAL expression                      # LessThanOrEqualPredicate
    | expression (NOT)? IN inList                           # InPredicate
    | expression (NOT)? BETWEEN expression AND expression   # BetweenPredicate
    | expression (NOT)? LIKE expression (likeEscape)?       # LikePredicate
    | MEMBER OF path                                        # MemberOfPredicate
    ;

expression
    : expression DOUBLE_PIPE expression         # ConcatenationExpression
    | expression PLUS expression                # AdditionExpression
    | expression MINUS expression               # SubtractionExpression
    | expression ASTERISK expression            # MultiplicationExpression
    | expression SLASH expression               # DivisionExpression
    | expression PERCENT expression             # ModuloExpression
    // todo (6.0) : should these unary plus/minus rules only apply to literals?
    //      if so, move the MINUS / PLUS recognition to the `literal` rule
    //      specificcally for numeric literals
    | MINUS expression                          # UnaryMinusExpression
    | PLUS expression                           # UnaryPlusExpression
    | caseStatement                             # CaseExpression
    | coalesce                                  # CoalesceExpression
    | nullIf                                    # NullIfExpression
    | literal                                   # LiteralExpression
    | parameter                                 # ParameterExpression
    | entityTypeReference                       # EntityTypeExpression
    | path                                      # PathExpression
    | function                                  # FunctionExpression
    | LEFT_PAREN querySpec RIGHT_PAREN          # SubQueryExpression
    ;

path
    : syntacticDomainPath (pathContinuation)?
    | generalPathFragment
    ;

The "only" issue is that we'll need to use the HQL parser, which is probably a bit more expensive than what we do ATM. And we can't use it because it will reject identifiers with ? at the end, but we forked it so we can actually add a special node type for it.

It's possible we'll want to start expansing those PanacheQL fragments at build-time to cut run-time costs, though probably they are actually moot since Hibernate will parse them anyway later on.

emmanuelbernard commented 5 years ago

I think the parser you describe is what @loicmathieu has done for MongoDB right? At least I see the panacheql module with a grammar. While I'm no grammar guy, I think supporting the ? and behaving differently on ? path should not require too much extra work. Or am I naive? I guess complex nested and / or could lead to empty () so the query would need to be rewritten or replace it with ( 1 == 1 )?

emmanuelbernard commented 4 years ago

While discussing with @gavinking, he advised to come with a formal set of rules for how the feature should operate. I finally got the time to sit down and think about it.

CC @loicmathieu @FroMage

Goal

Construct query structures depending of the presence or absence of a parameter. Often, a complex search query is build from if statements and string concatenation depending on the presence or absence of an input parameter (absence materialized by null or some other marker).

Examples

firstname? = ?1 and lastname? = ?2

lastname? = ?1 or status? = ?2

(firstname? = ?1 and lastname? = ?2) or status? = ?3

you can mix optional predicates and non optional ones.

firstname? = ?1 and lastname = ?2

Formal rules

I explored a few options but in the end Stephane had it the most right with his AST proposal.

An optional predicate including a parameter value which is null is removed from the AST node and it's parent OR or AND node is replaced by the sibling of the optional predicate eliminated.

Said differently, replace AND and OR nodes with the non optional node. If 2 optional nodes are attached to the AND / OR, keep the left one.

If the AST of the where clause ends up being empty, there are two options:

I haven't made up my mind, I am concerned about the fact that a query could mistakenly return more things than expected in this extreme condition but that's an arbitrary rule. NOTE: we can't just say that if all parameters are null, we return nothing (or don't query) because this query is valid status = 'ADMIN' and lastname? = ?1

NOTE: it's the whole predicate, not just path = ?param, it could apply to functions using a parameter and other forms. So all elements of the node are eliminated up to the AND / OR operator.

A few tree pruning examples:

    OR
   /  \
 AND   S?
 / \
L?   F?

assuming the parameter for L is null, is equivalent to

    OR
   /  \
  F?   S?

assuming also that the parameter for F is null, is equivalent to

    S?

Another example

    OR
   /  \
 AND   S?
 / \
L?   F?

assuming the parameter for S is null, is equivalent to

 AND
 / \
L?   F?

I did not formally prove it but I am pretty sure that regardless of the tree walking algorithm, the pruning algorithm converges to the same results.

Failed attempts

I did try another approach which was to replace the optional predicate with 1=1 if it is attached to AND and 1=0 if it is attached to OR. The idea was to use a custom ternary logic to achieve the tree simplification. Unfortunately it does not work for nested elements.

(lastname? = ?1 and firstname? = ?2) or status? = ?3 becomes (1=1 and 1=1) or 1=0 where one would have expected that if all predicates are optional null in the and clause, the total result would be 1=0. We need to retain the info that a predicate is opted out to carry that info.

Question on the syntax

I am not super happy about the syntax path? because someone could write something like firstname? = lastname? or firstname? = 'foo'.

Here is a verbose but maybe more accurate syntax: optional( firstname = ??, ?1 ) and optional( lastname = ??, ?2 ) where the first operand of optional is the predicate to be eliminated and the second one is the actual parameter used to evaluate the optionality.

Alternative syntaxes

  1. optional( firstname = ??, ?1 ) and optional( lastname = ??, ?2 )
  2. optional( firstname = ?1 ) and optional( lastname = ?2 )
  3. firstname? = ?1 and lastname? = ?2
  4. firstname = ??1 and lastname = ?:somename
  5. firstname = ?1? and lastname = :somename?
  6. firstname = [?1] and lastname = [:name]
  7. firstname = [?1]? and lastname = [:name]?
  8. [firstname = ?1]? and [lastname? = ?2]? : show more the predicate limit being eliminated. but still verbose

8 and 1 are the most accurate as they select the whole predicate. 4, 5, 6 and 7 are quite accurate as they highlight the parameter being evaluated for nullness. I do like the elegance of 3 because it reminds me of Ceylon's ?.

There you go, we can discuss the semantic now.

FroMage commented 4 years ago

I like the conciseness of path? = :param, but it's indeed not ideal, because it sort of implies that the condition depends on the path propery being null, where it's really about the :param. This is what leads you to be afraid of people writing firstname? = lastname?.

I initially thought we could write path ?= :param and tie it to the operator, but we'd have to support every operator that takes a parameter, which may be confusing.

So, it would feel less ambiguous to mark specially the parameter that can be null, perhaps by changing the :name syntax to ?name, but that leaves positional params out, unless we use ??0 which is really hard to justify.

loicmathieu commented 4 years ago

I do like the proposal of @FromMage to put the optional caracteristic of a predicate on the operator and not the field path, this better describe what is an optional predicate: a variant of an existing predicate. With this, the implementation will be more straightforward, and the intent clearer inside the query.

The issue is on the implementation side.

Ideally we should be able to work on the AST level to prune nodes that should disapear due to the optionality. That is what @emmanuelbernard describe in his comment (by the way, thanks for it, it adds a lot of clarity for what needs to be done).

If you look at my implementation for MongoDB you can see that it's not what I did.

In fact, the current PanacheQL implementation inside MongoDB with Panache uses an Antlr visitor, the visitor is not capable or pruning an AST node (it can only replace it, not remove it), and not capable of updatong the parent node. Idealy we must act on the AST level in multiple passes (mark pass and delete pass) and I don't know who to do this with Antlr.

So, the implementation is harder that what it seems first, if we cannot simply replace a node by something else (empty document for MongoDB, 1=1 where clause for Hibernate) the implementation will be very much complex ...

emmanuelbernard commented 4 years ago

What about

 "name = optional(?1) or status = optional( :status )"

It is more verbose but very clear.

FroMage commented 4 years ago

I'd like to ask for @gavinking's opinion about syntax. He's pretty good at throwing ideas in and out.

gavinking commented 4 years ago

Hrm. I've been tagged so many times in this issue, but somehow I never got a proper notification until Stef sent me a message today. Need to check my filters.

gavinking commented 4 years ago

Alright, so my first reactions are:

So I guess I like the look of:

[title like ?1] and [owner = ?2] and [order < ?3] order by order

I find this very readable.

However, there is one thing that's wrong with that: the logical connectives should really also go inside the brackets, yielding:

[title like ?1] [and owner = ?2] [and order < ?3] order by order

However, I'm not sure I like that better.

OTOH, consider the extension to multiple predicates within a single optional condition:

title like ?1 [and owner = ?2 and order < ?3] order by order

WDYT?

gavinking commented 4 years ago

(P.S. I know I was the one advocating that we treat this as some sort of pseudo "function" in HQL, and I can see I'm sorta walking back from that now, at least with respect to the syntax.)

gavinking commented 4 years ago

Potential objection of using brackets is that it hammers the natural syntax for list literals. But I highly doubt we'll ever want list literals in HQL.

FroMage commented 4 years ago

I'm not very keen on putting the marker on any of the three individual bits of the predicate. Since it's the whole predicate that gets removed, it seems to me that the clearest thing is to place delimiters around the whole predicate.

Well, in my mind, this is a two-actor team:

So the semantics really belong mostly on the operator, because it's that operator which changes semantics if we opt in or not:

gavinking commented 4 years ago

@FroMage Yeah, I originally tried to go down that path when I chatted with Emmanuel, but I think we decided that no, that would be adding a new sort of ternary logic that collided with the ternary logic of SQL and was broken in novel and different ways to SQL's ternary logic.

So after a bit of a think about it I decided it would probably be a mistake.

FroMage commented 4 years ago

So, with that in mind, I guess I'd favour the following variants:

The Perl user in me finds [owner = ?2] confusing, while the BNF user in me finds it perfectly fitting.

gavinking commented 4 years ago

So, with that in mind, I guess I'd favour the following variants ...

To be clear, I think either of those options are perfectly acceptable outcomes. They're very readable.

OTOH, the option of putting the operator inside the brackets does have the advantage that we don't have to come up with strange and half-convincing backsplanations as to what is the meaning of a dangling and or or operator.

gavinking commented 4 years ago

(The issue being that or [name=?n] means or false when n is missing, while and [name=?n] means and true when n is missing.)

gavinking commented 4 years ago

Rrrrm the more I stare at this example, the more I think it's telling us that the logical operator should go inside:

title like ?1 [and owner = ?2 and order < ?3] order by order
emmanuelbernard commented 4 years ago

(The issue being that or [name=?n] means or false when n is missing, while and [name=?n] means and true when n is missing.)

Not even, see this loooong post https://github.com/quarkusio/quarkus/issues/2303#issuecomment-622331293

emmanuelbernard commented 4 years ago

So, with that in mind, I guess I'd favour the following variants:

* `[title like ?1] and [owner = ?2] and [order < ?3] order by order`

* `optional(title like ?1) and optional(owner = ?2) and optional(order < ?3) order by order`

The Perl user in me finds [owner = ?2] confusing, while the BNF user in me finds it perfectly fitting.

So I considered the Gavin bracket syntax and was scared it would conflict with HQL but if it does not, I think that's the winner. I need to let the [and foo = ?1]variant but I don't think that's quite that. It would be something like

[( [a = ?1] and [b = ?2]) [or] c = ?3] which is just wrong. Remember the operation moves up the tree.

hantsy commented 4 years ago

Any progress of this?

I would like to use Typesafe APIs such as JPA Creteria(Specification in Spring Data JPA) or QueryDSL to assemble the Query logic by pure Java codes instead of literal strings.

TheParad0X commented 3 years ago

I came here after posting this To me, this is a key feature that is missing. All the "easy" and "simplified" panache ORM examples do not meet the requirements for most real world applications. This should be prioritized.

antssilva96 commented 2 years ago

Any updates on this ?

loicmathieu commented 2 years ago

@antssilva96 no update on it, you can see https://github.com/quarkusio/quarkus/issues/8136 that tries to sum up all the ideas around dynamic query support in Panache.

I open a discussion on Zulip as I think we need to move forward or decide to close all these issues: https://quarkusio.zulipchat.com/#narrow/stream/187038-dev/topic/Panache.20dynamic.20query

nseb commented 4 months ago

Any update ?