mapaction / mapactionpy_controller

7 stars 6 forks source link

Duplicate JIRA tasks are created on subsequent runs #115

Open andrewphilipsmith opened 3 years ago

andrewphilipsmith commented 3 years ago

How to recreate

1) Run mapChef on a non-trivial cookbook with incomplete data, and JIRA integration enabled (so that JRIA tasks are created) 2) Do not alter the data. 3) Re-run mapChef.

Expected behaviour

When data errors are encountered, MapChef should add comments to the relevant extant tasks. and not create new tasks.

Actual behaviour

In the majority of cases, MapChef comments on existing tasks. In a minority of cases, new duplicate tasks are created. This occurs inconsistently and without a discernable pattern.

andrewphilipsmith commented 3 years ago

All JIRA tasks are logged here https://mapaction.atlassian.net/jira/servicedesk/projects/PIPET

andrewphilipsmith commented 2 years ago

There are two possible (non-exclusive) ideas to tackle this. Both ideas are based on the fact that neither the mapactionpy_controller, nor MapChef more generally, maintains state in between runs.

A) Improve the unique identifier for the task

When the controller encounters a scenario that requires creating or updating a JIRA task, it will generate a title for that task. The title is deterministic, often including the relevant filename (or similar) to make it distinct.

JIRA itself, through its JQL query language, only allows fuzzy searches on text fields. Because specific files can legitimately have filenames that are only slightly different, it is possible that task titles won't match exactly as expected in a fuzzy search.

1) Add a custom field task_hash_id to the relevant JIRA task type. 2) Generate the "task title" as before 3) Generate a hash of the "task title" and the "operation_id", and store this value in task_hash_id field. 4) Make all queries using the JIRA API and JQL, use the task_hash_id rather than the "task title".

This approach assumes that the "task_hash_id" values will be more lexicographically distinct within JIRA's fuzzy search than the task titles themselves.

B) Query all JIRA tasks with the relevant operation_id and match locally.

At present, mapactionpy_controller only queries JIRA for individual tasks when it needs to create or update a task. Alternatively:

1) At initiation, the JiraClient object could query all of the tasks with the relevant operation_id. 2) Instead of querying the individual tasks using JQL, matching individual tasks can be done locally by string comparisons.

This approach has the advantage that it would be possible for the controller to actively identify tasks that are no longer relevant (and can be closed), rather than just passively identifying current ones.