holunda-io / camunda-bpm-taskpool

Library for pooling user tasks and process related business objects.
https://www.holunda.io/camunda-bpm-taskpool/
Apache License 2.0
67 stars 26 forks source link

Queries for JpaView with multiple filters for the same attribute return too many results #845

Closed S-Tim closed 1 year ago

S-Tim commented 1 year ago

Steps to reproduce

Available Tasks:
Task 1:
- payload
 - customer="ABC"
 - "key"="value"
 - "key-int"=1
 - "engine"="engine"

Task 2:
- payload
 - customer="ABC"
 - "key"="value"
 - "key-int"=1
 - "engine"="engine"

Task 3:
- payload
 - "customer"="DEF"
 - "key"="value"
 - "key-int"=1
 - "engine"="engine"
 TasksForUserQuery(
        user = User("bar", setOf("foo")),
        assignedToMeOnly = false,
        filters = listOf("customer=ABC", "customer=DEF")
      )

Expected behaviour

The query should apply the filters in the form attribute1=value1 OR attribute1=value2 and yield three results

Actual behaviour

The query yields 12 results

The generated SQL query looks like this:

 select
        taskentity0_.task_id as task_id1_10_,
        taskentity0_.assignee_id as assignee2_10_,
        taskentity0_.business_key as business3_10_,
        taskentity0_.date_created as date_cre4_10_,
        taskentity0_.description as descript5_10_,
        taskentity0_.date_due as date_due6_10_,
        taskentity0_.date_follow_up as date_fol7_10_,
        taskentity0_.form_key as form_key8_10_,
        taskentity0_.name as name9_10_,
        taskentity0_.owner_id as owner_i10_10_,
        taskentity0_.payload as payload11_10_,
        taskentity0_.priority as priorit12_10_,
        taskentity0_.application_name as applica13_10_,
        taskentity0_.source_def_id as source_14_10_,
        taskentity0_.source_def_key as source_15_10_,
        taskentity0_.source_execution_id as source_16_10_,
        taskentity0_.source_instance_id as source_17_10_,
        taskentity0_.source_name as source_18_10_,
        taskentity0_.source_type as source_19_10_,
        taskentity0_.source_tenant_id as source_20_10_,
        taskentity0_.task_def_key as task_de21_10_ 
    from
        plf_task taskentity0_ 
    inner join
        plf_task_payload_attributes payloadatt1_ 
            on taskentity0_.task_id=payloadatt1_.task_id 
    inner join
        plf_task_payload_attributes payloadatt2_ 
            on taskentity0_.task_id=payloadatt2_.task_id 
    where
        (
            payloadatt1_.path=? 
            and payloadatt1_.value=? 
            or payloadatt2_.path=? 
            and payloadatt2_.value=?
        ) 
        and (
            ? in (
                select
                    authorized3_.authorized_principal 
                from
                    plf_task_authorizations authorized3_ 
                where
                    taskentity0_.task_id=authorized3_.task_id
            ) 
            or ? in (
                select
                    authorized4_.authorized_principal 
                from
                    plf_task_authorizations authorized4_ 
                where
                    taskentity0_.task_id=authorized4_.task_id
            ) 
            or taskentity0_.assignee_id=?
        ) 
    order by
        taskentity0_.date_created desc limit ?

Every task in the result is duplicated 4 times in this setup and they all match the condition:

 where
        (
            payloadatt1_.path=? 
            and payloadatt1_.value=? 
            or payloadatt2_.path=? 
            and payloadatt2_.value=?
        ) 

The multiple joins (one for each payload attribute matcher) cause an explosion of results. Before the OR matching this was not clear in the result because the AND matchers all operated on a specific join (for example payloadatt1)

payloadatt1_.path=? 
and payloadatt1_.value=? 

so duplicates were not possible

I did find a fix that works for now:

    fun hasTaskPayloadAttribute(name: String, value: String): Specification<TaskEntity> =
      Specification { task, query, builder ->
        query.distinct(true)
        val join = task.join<TaskEntity, Set<PayloadAttribute>>(TaskEntity::payloadAttributes.name)
        val pathEquals = builder.equal(
          join.get<String>(PayloadAttribute::path.name),
          name
        )
        val valueEquals = builder.equal(
          join.get<String>(PayloadAttribute::value.name),
          value
        )
        builder.and(pathEquals, valueEquals)
      }

Using the query parameter in the Specification creation and setting distinct to true there fixes the issue. The distinct is applied to all of these Specifications resulting in the same query as before but with select distinct instead of select. The same fix applied to the fun hasDataEntryPayloadAttribute(name: String, value: String): Specification<DataEntryEntity> function. In general this feels a little more like a workaround and I need to understand the issue a little bit better before creating a pull request that fixes the underlying issue.

S-Tim commented 1 year ago

I implemented the fix with the distinct for now and opened a new issue for the performance issue. The performance now is the same as before, where only AND filters existed.