sartography / spiff-arena

SpiffWorkflow is a software development platform for building, running, and monitoring executable diagrams
https://www.spiffworkflow.org/
GNU Lesser General Public License v2.1
62 stars 42 forks source link

process instance events lose task info when tasks are deleted #1941

Open jasquat opened 1 month ago

jasquat commented 1 month ago

Originally found in https://github.com/sartography/spiff-arena/issues/1909#issuecomment-2232493789

When we delete tasks from the database such as during process instance migration and reset to task, the task information can no longer appear with its corresponding event. This means that the events table on the process instance show page only shows "task_completed" without showing what task it is.

A couple potential fixes include:

essweine commented 1 month ago

If we consider revamping how tasks and events are stored, we should consider relying on the event stream that @jbirddog and I are working on. If we considered this as more of a data warehouse problem than a running process problem, we could get them out of the db entirely:

It would require more stuff being redirected to the event store than our initial effort, but we've talked about potentially doing that anyway.

burnettk commented 1 month ago

another thing we were brainstorming today, like instead of getting process_instance_event out of the app, we could kill task and rely on a persisted version of the event stream (always insert, never update) for transactional purposes. it's definitely convenient to have one record per thing with a task guid, but it seems possible that we could write a query that would work to put things together to hand to spiff lib.

i like your idea of killing the existing process_instance_event stuff better, but it's good to have options. :D

jbirddog commented 1 month ago

If we consider revamping how tasks and events are stored, we should consider relying on the event stream that @jbirddog and I are working on. If we considered this as more of a data warehouse problem than a running process problem, we could get them out of the db entirely:

* they'll be duplicated there for historical reasons anyway

* application db can focus solely on holding the current state, which would ultimately make things simpler

It would require more stuff being redirected to the event store than our initial effort, but we've talked about potentially doing that anyway.

This sounds like a nice approach/first use case for querying the event stream server. If the backend can't find the data then it could ask the event stream for historical data. If it can't be found there we can fall back to what we do today. Over time more of these "non current state" types of queries could shift to leveraging the historical data from the event stream server.