clarify SqlResultSetMapping with multiple EntityResult and conflicting aliases

beikov commented 2 years ago

In 3.10.16.1. the spec mentions a particular example that the test(com/sun/ts/tests/jpa/core/annotations/nativequery/Client.java#nativeQueryTest1) refers to and says

When the column names of the SQL result do not correspond to those of the object/relational mapping metadata, more explicit SQL result mapping metadata must be provided to enable the persistence provider runtime to map the JDBC results into the expected objects. This might arise, for example, when column aliases must be used in the SQL SELECT clause when the SQL result would otherwise contain multiple columns of the same name or when columns in the SQL result are the results of operators or functions. The FieldResult annotation element within the EntityResult annotation is used to specify the mapping of such columns to entity attributes.

The example being

Query q = em.createNativeQuery(
  "SELECT o.id, o.quantity, o.item, i.id, i.name, i.description " +
  "FROM Order o, Item i " +
  "WHERE (o.quantity > 25) AND (o.item = i.id)",
  "OrderItemResults");
@SqlResultSetMapping(name="OrderItemResults", entities={
  @EntityResult(entityClass=com.acme.Order.class),
  @EntityResult(entityClass=com.acme.Item.class)
})

Both o.id and i.id select items would get the column alias id, so AFAIU, according to the specification, this would require the introduction of an alias in the SQL and the use of @FieldResult to disambiguate which result set index should be for which attribute. Yet this TCK test expects this to work and I think that this might be just by accident, because in the test data(see https://github.com/eclipse-ee4j/jakartaee-tck/blob/master/src/com/sun/ts/tests/jpa/core/annotations/nativequery/Client.java#L69) the corresponding item for an order has the same id. So the test doesn't necessarily clarify or assert how this case should be supported.

Should JPA providers assume that a result set column can only be "consumed" for a single attribute mapping i.e. the first encounter of a column label id is consumed for the Order#id attribute and the second one for Item#id?

beikov commented 2 years ago

Or should a JPA provider try to match the results also based on table name through ResultSetMetaData#getTableName?

sebersole commented 2 years ago

Or should a JPA provider try to match the results also based on table name through ResultSetMetaData#getTableName?

For me, that is the most reasonable expectation. From 3.10.16.1 Returning Managed Entities from Native Queries:

When the column names of the SQL result do not correspond to those of the object/relational mapping metadata, more explicit SQL result mapping metadata must be provided to enable the persistence pro- vider runtime to map the JDBC results into the expected objects. This might arise, for example, when column aliases must be used in the SQL SELECT clause when the SQL result would otherwise contain multiple columns of the same name or when columns in the SQL result are the results of operators or functions. The FieldResult annotation element within the EntityResult annotation is used to specify the mapping of such columns to entity attributes.

Strictly following this documentation section, the expectation should be that explicit @FieldResult definitions would be needed for Order.id and Item.id since they are "multiple columns of the same name". However, the example you pasted does not supply them, so I think a natural conclusion there is that the expectation for "the same name" extend to {tableName}.{columnName} combo.

So that's how I would read it.

beikov commented 2 years ago

I implemented the matching with the table name now in Hibernate, but the Oracle and SQL Server JDBC drivers do not provide the table name for selection items, so the only possible interpretations I can imagine are:

The JPA provider must "consume" selection items i.e. the lookup of the selection index for Order#id is 1, and following lookups must skip previously consumed selection item indices, hence Item#id will use the index 4
The JPA provider should resolve for both, Order#id and Item#id the selection index 1
The JPA provider should do a best effort matching based on the data provided by the JDBC drivers and is allowed to fail on such a query

I'll have to test that, but I fear that EclipseLink might have chosen option 2. as interpretation accidentally, because the test com/sun/ts/tests/jpa/core/annotations/nativequery/Client.java#nativeQueryTest1 uses test data and assertions that would allow this interpretation. I think that option 1. and/or 3. would be better interpretations though.

Can @lukasj share some details here with respect to expectations or how EclipseLink interpreted this? We really need to clarify this in the specification and adapt the test data of the test to avoid this confusion.

beikov commented 2 years ago

I just verified that EclipseLink in fact chose to interpret this as described in option 2, which can't be a sensible thing to expect from a JPA provider. The code handling this is in org.eclipse.persistence.queries.EntityResult#getValueFromRecord which calls org.eclipse.persistence.internal.sessions.ArrayRecord#get which just returns the first selection that matches the column label.

In Hibernate we chose to throw an error in such an ambiguous case to signal that explicit aliases are required. It would be a pity if we'd have to put this safety net behind a flag for passing the TCK. Can we fix this in a 3.0.2 version of the TCK? @scottmarlow

scottmarlow commented 2 years ago

@beikov

Can we fix this in a 3.0.2 version of the TCK? @scottmarlow

Tests can be excluded but not updated in a service (micro) release such as 3.0.2.

I'll bring more attention to this issue on the Persistence mailing list so that we might hear from others on the https://github.com/eclipse-ee4j/jakartaee-tck/pull/843 change.

gavinking commented 2 years ago

I think it's pretty clear that this is simply an erroneous example, which directly contradicts the text of the specification that it accompanies.

Now, 3.10.16.1 of the current release of the spec is IMO rather unclear, and is urgently in need of a rewrite. But, in order to see that the example is simply wrong, we can go back to the very first release of the JPA spec, and see that the exact same example is given, but the text which accompanies it states quite explicitly that:

Note that column aliases must be used in the SQL SELECT clause where the SQL result would otherwise contain multiple columns of the same name.

(My emphasis.)

It's quite clear, then, that the example does not obey this rule, and is simply a mistake in the JPA 1.0 specification.

And, FTR, I'm quite certain that the we (the EJB 3.0 expert group) never had any intention that result set mappings could be inferred from anything beyond the column alias.

Now, that quoted text was removed somewhere along the way to JPA 2.1. Why, I have no idea. But, AFAICS, it wasn't replaced with any text which states what should happen if this case is allowed, and I read the following, rather imprecise text as implying that that restriction stated in the removed text is still implicitly assumed:

This might arise, for example, when column aliases must be used in the SQL SELECT clause when the SQL result would otherwise contain multiple columns of the same name

So I would say that this example and the TCK test derived from it are simply wrong.

beikov commented 2 years ago

And, FTR, I'm quite certain that the we (the EJB 3.0 expert group) never had any intention that result set mappings could be inferred from anything beyond the column alias.

This would make things a lot easier. Like I wrote, also EclipseLink relies only on the column alias, and since it doesn't do any kind of alias collision validation, it doesn't understand that the result set mapping for this query doesn't "work".

IMO the test should be excluded for now and for 3.0.2+ the spec text and TCK sample should be updated to use proper column aliases.

scottmarlow commented 2 years ago

As per https://jakarta.ee/committees/specification/tckprocess, approval is the default after 14 days (March 4 I think), unless the challenge is accepted/denied before then.

lukasj commented 2 years ago

Isn't it wrong match? As I understand the wording and the flow of the text in the spec, my understanding is that there is a mismatch of the wording and example in the initial description of this issue.

To the example given, following text applies: The following query and SqlResultSetMapping metadata illustrates the return of multiple entity types. It assumes default metadata and column name defaults.

...so the returned entry is supposed to be an array of form [[Order][Item]]

then the spec says the When the column names of the SQL result do not correspond to ... and: The following example combining multiple entity types includes aliases in the SQL statement. This requires that the column names be explicitly mapped to the entity fields corresponding to those columns. The FieldResult annotation is used for this purpose.

Query q = em.createNativeQuery(
    "SELECT o.id AS order_id, " +
        "o.quantity AS order_quantity, " +
        "o.item AS order_item, " +
        "i.id, i.name, i.description " +
        "FROM Order o, Item i " +
        "WHERE (order_quantity > 25) AND (order_item = i.id)",
    "OrderItemResults");

@SqlResultSetMapping(name="OrderItemResults", entities={
    @EntityResult(entityClass=com.acme.Order.class, fields={
        @FieldResult(name="id", column="order_id"),
        @FieldResult(name="quantity", column="order_quantity"),
        @FieldResult(name="item", column="order_item")}),
    @EntityResult(entityClass=com.acme.Item.class)
})

Or am I missing something?

beikov commented 2 years ago

Ok, so in the spec, the first example

Query q = em.createNativeQuery(
    "SELECT o.id, o.quantity, o.item, i.id, i.name, i.description " +
        "FROM Order o, Item i " +
        "WHERE (o.quantity > 25) AND (o.item = i.id)",
    "OrderItemResults");

@SqlResultSetMapping(name="OrderItemResults", entities={
    @EntityResult(entityClass=com.acme.Order.class),
    @EntityResult(entityClass=com.acme.Item.class)
})

is just there to show this is wrong/problematic and should be replaced with explicit aliases?

beikov commented 2 years ago

Is my understanding correct, that the example in the specification document is for illustrating a problematic case where aliases are required? @lukasj

lukasj commented 2 years ago

yes, that is my understanding. First it shows what can be problematic and then it shows the solution for it. I think it was wrong to just copy the first snippet to tcks. There can be various solutions - from entirely dropping the test to updating snippet together with the test to follow expectations.

beikov commented 2 years ago

Ok, thanks for the prompt response. Do you think the specification language should be changed for this case to require the JPA provider to throw an error? If not, I would at least try to make it more clear in the specification that this mapping is invalid and either (a) remove the nativeQueryTest1 test from the TCK or (b) adapt the test so that it uses a mapping similar to nativeQueryTest3 which uses proper aliases.

scottmarlow commented 2 years ago

yes, that is my understanding. First it shows what can be problematic and then it shows the solution for it. I think it was wrong to just copy the first snippet to tcks. There can be various solutions - from entirely dropping the test to updating snippet together with the test to follow expectations.

Are we ready to mark this challenge as accepted so that we can generate a new Persistence 3.0 Standalone TCK with the com/sun/ts/tests/jpa/core/annotations/nativequery/Client.java#nativeQueryTest1 excluded?

lukasj commented 2 years ago

yes, com/sun/ts/tests/jpa/core/annotations/nativequery/Client.java#nativeQueryTest1 needs to be excluded from 3.0.x and removed from 3.1.x.

Does the text in the spec need to be changed as well? What I have in mind is to change:

When the column names of the SQL result do
not correspond to those of the object/relational mapping metadata, more
explicit SQL result mapping metadata must be provided...

to

When the column names of the SQL result do
not correspond to those of the object/relational mapping metadata or introduce a conflict
in mapping column defaults as in the example code above,
more explicit SQL result mapping metadata must be provided...

beikov commented 2 years ago

Looks good to me!

jakartaee / persistence

clarify SqlResultSetMapping with multiple EntityResult and conflicting aliases #350