The query should apply the filters in the form attribute1=value1 OR attribute1=value2 and yield three results
Actual behaviour
The query yields 12 results
The generated SQL query looks like this:
select
taskentity0_.task_id as task_id1_10_,
taskentity0_.assignee_id as assignee2_10_,
taskentity0_.business_key as business3_10_,
taskentity0_.date_created as date_cre4_10_,
taskentity0_.description as descript5_10_,
taskentity0_.date_due as date_due6_10_,
taskentity0_.date_follow_up as date_fol7_10_,
taskentity0_.form_key as form_key8_10_,
taskentity0_.name as name9_10_,
taskentity0_.owner_id as owner_i10_10_,
taskentity0_.payload as payload11_10_,
taskentity0_.priority as priorit12_10_,
taskentity0_.application_name as applica13_10_,
taskentity0_.source_def_id as source_14_10_,
taskentity0_.source_def_key as source_15_10_,
taskentity0_.source_execution_id as source_16_10_,
taskentity0_.source_instance_id as source_17_10_,
taskentity0_.source_name as source_18_10_,
taskentity0_.source_type as source_19_10_,
taskentity0_.source_tenant_id as source_20_10_,
taskentity0_.task_def_key as task_de21_10_
from
plf_task taskentity0_
inner join
plf_task_payload_attributes payloadatt1_
on taskentity0_.task_id=payloadatt1_.task_id
inner join
plf_task_payload_attributes payloadatt2_
on taskentity0_.task_id=payloadatt2_.task_id
where
(
payloadatt1_.path=?
and payloadatt1_.value=?
or payloadatt2_.path=?
and payloadatt2_.value=?
)
and (
? in (
select
authorized3_.authorized_principal
from
plf_task_authorizations authorized3_
where
taskentity0_.task_id=authorized3_.task_id
)
or ? in (
select
authorized4_.authorized_principal
from
plf_task_authorizations authorized4_
where
taskentity0_.task_id=authorized4_.task_id
)
or taskentity0_.assignee_id=?
)
order by
taskentity0_.date_created desc limit ?
Every task in the result is duplicated 4 times in this setup and they all match the condition:
where
(
payloadatt1_.path=?
and payloadatt1_.value=?
or payloadatt2_.path=?
and payloadatt2_.value=?
)
The multiple joins (one for each payload attribute matcher) cause an explosion of results. Before the OR matching this was not clear in the result because the AND matchers all operated on a specific join (for example payloadatt1)
payloadatt1_.path=?
and payloadatt1_.value=?
so duplicates were not possible
I did find a fix that works for now:
fun hasTaskPayloadAttribute(name: String, value: String): Specification<TaskEntity> =
Specification { task, query, builder ->
query.distinct(true)
val join = task.join<TaskEntity, Set<PayloadAttribute>>(TaskEntity::payloadAttributes.name)
val pathEquals = builder.equal(
join.get<String>(PayloadAttribute::path.name),
name
)
val valueEquals = builder.equal(
join.get<String>(PayloadAttribute::value.name),
value
)
builder.and(pathEquals, valueEquals)
}
Using the query parameter in the Specification creation and setting distinct to true there fixes the issue. The distinct is applied to all of these Specifications resulting in the same query as before but with select distinct instead of select.
The same fix applied to the fun hasDataEntryPayloadAttribute(name: String, value: String): Specification<DataEntryEntity> function.
In general this feels a little more like a workaround and I need to understand the issue a little bit better before creating a pull request that fixes the underlying issue.
I implemented the fix with the distinct for now and opened a new issue for the performance issue.
The performance now is the same as before, where only AND filters existed.
Steps to reproduce
Expected behaviour
The query should apply the filters in the form attribute1=value1 OR attribute1=value2 and yield three results
Actual behaviour
The query yields 12 results
The generated SQL query looks like this:
Every task in the result is duplicated 4 times in this setup and they all match the condition:
The multiple joins (one for each payload attribute matcher) cause an explosion of results. Before the OR matching this was not clear in the result because the AND matchers all operated on a specific join (for example payloadatt1)
so duplicates were not possible
I did find a fix that works for now:
Using the query parameter in the Specification creation and setting distinct to true there fixes the issue. The distinct is applied to all of these Specifications resulting in the same query as before but with select distinct instead of select. The same fix applied to the
fun hasDataEntryPayloadAttribute(name: String, value: String): Specification<DataEntryEntity>
function. In general this feels a little more like a workaround and I need to understand the issue a little bit better before creating a pull request that fixes the underlying issue.