vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.47k stars 584 forks source link

DocumentSelector doesn't work when having a document type that inherits from another and we do not specify a field #26305

Open danitico opened 1 year ago

danitico commented 1 year ago

Describe the bug Related to #26126. We are working with the DocumentSelector class and we are encountering the following problem:

We have document type called "caselaw" that inherits from another document type called "docs". We set up two different conditions for a document of type caselaw:

However, the first one returns false and the second one returns true. The problem is located at https://github.com/vespa-engine/vespa/blob/07afcf4b4ad1034601b8b438ff4b13ed63e1b0c1/document/src/main/java/com/yahoo/document/select/rule/DocumentTypeNode.java#L49 and in https://github.com/vespa-engine/vespa/blob/07afcf4b4ad1034601b8b438ff4b13ed63e1b0c1/document/src/main/java/com/yahoo/document/select/rule/DocumentNode.java#L51

I think it should have something like this, in which the inherited type is checked, so both conditions are true.

https://github.com/vespa-engine/vespa/blob/07afcf4b4ad1034601b8b438ff4b13ed63e1b0c1/document/src/main/java/com/yahoo/document/select/rule/DocumentNode.java#L56

To Reproduce Steps to reproduce the behavior:

  1. Define two schemas A and B, where B inherits from A. A defines a field that is inherited by B.
  2. Use vespa-documentgen-plugin to generate concrete types for both A and B
  3. Create an object of the class of concrete type B
  4. Define the following conditions for the parser: "A" and "B"
  5. The first condition will fail but not the second one

Expected behavior Both expressions should be true

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Vespa version 8.132.43

Additional context Nothing

bratseth commented 1 year ago

Yes, a document type name as a selection expression ("A") could mean either "equality" is "isa", matching just "A" or alsp all supertypes of A respectively.

Some users would be surprised in either interpretation, and in this case we have chosen equality semantics because it is easier to get "isa" semantics from "equality" than the other way around (just list all supertypes with OR vs. A and not ...), and because we believe surprises in this direction are less likely to be damaging.

If there is some compelling reason to support "isa" semantics as well we could create an additional operator for that.

danitico commented 1 year ago

@bratseth I understand your point that it can be confusing. For now, we will hardcode that at running time and replace the generic type token concatenating every type with OR. Although it will helpful to have an additional operator to improve the readability of expressions.