it-at-m / digiwf-core

central workflow automation and integration platform based on the free process framework Camunda.
MIT License
19 stars 7 forks source link

Cleanup Task Difference #1392

Closed simonhir closed 4 months ago

simonhir commented 6 months ago

Cleanup the current existing task difference between engine and tasklist. The difference is visible through the new monitoring. To better monitor new occurring differences for #1348 the current difference needs to be cleanup on all environments. This can be done by comparing the engine-db task entries with the tasklist db task-entries and removing all entries from the tasklist which are not present in the engine. If this does not completely fix the difference there needs to be also checked if there exist task in the engine which don't exist in the tasklist.

After the cleanup, we need an alert to notify us if this problem occurs again.

Acceptance criteria

darenegade commented 5 months ago

Man könnte mit einem schlauen SQL Statement oder einem Skript die Reihenfolge der Events in der domain_event_entry wieder korrigieren und dann den Token auf Dezember 23 zurücksetzen, sodass sich alle falschen Events wieder korrigieren.

simonhir commented 4 months ago

Folgende Usertasks fehlen auf Prod in der Taskliste.

Task-ID                 Typ         Instance-ID
6e9df7c1-c198-11ee-876a-0a580a8a338b    Zurückziehen        6e262f24-c198-11ee-876a-0a580a8a338b
7c5153f5-c190-11ee-876a-0a580a8a338b    Zurückziehen        7bc7d817-c190-11ee-876a-0a580a8a338b
2e2dbec3-c178-11ee-876a-0a580a8a338b    Zurückziehen        2d5119d6-c178-11ee-876a-0a580a8a338b
b1f2537c-bc14-11ee-8ffa-0a580a8a32ae    Zurückziehen        b17c117e-bc14-11ee-8ffa-0a580a8a32ae

Die Zurückziehen Tasks wurden hierbei mit Absicht nicht neu angelegt und sind seit dem letzten Cleanup auch deutlich zurück gegangen.

Updated: 29.04.2024

simonhir commented 4 months ago

Prod-Tasks siehe oben gecancled.

Processes-Test-Tasks von Dezember 2023 mit anderem Anwendungs-Namen gelöscht. Vermutlich durch Konfigurations-Fehler entstanden.

simonhir commented 4 months ago

Folgende drei Tasks aus Taskliste gelöscht:

Folgende zwei Tasks via Engine Modify neugestartet:

simonhir commented 4 months ago

Processes-Demo task 480c2b60-0869-11ef-8770-0a580a8a2e6e missing in tasklist. Following error in digiwf-engine-service log:

Log ``` EventListener [RoutingKafkaEventPublisher] failed to handle event [576e8609-ea8b-4751-84a6-e6998786c8d0] (io.holunda.camunda.taskpool.api.task.TaskCreatedEngineEvent). Continuing processing with next listener org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition. Wrapped by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition. at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:97) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:79) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:194) ... 26 common frames omitted Wrapped by: org.axonframework.messaging.EventPublicationFailedException: Event publication failed, exception occurred while waiting for event publication. at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:203) at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.send(KafkaPublisher.java:162) at org.axonframework.extensions.kafka.eventhandling.producer.KafkaEventPublisher.handle(KafkaEventPublisher.java:80) at de.muenchen.oss.digiwf.task.polyflow.kafka.RoutingKafkaEventPublisher.handle(RoutingKafkaEventPublisher.java:31) at org.axonframework.eventhandling.SimpleEventHandlerInvoker.invokeHandlers(SimpleEventHandlerInvoker.java:128) at org.axonframework.eventhandling.SimpleEventHandlerInvoker.handle(SimpleEventHandlerInvoker.java:114) at org.axonframework.eventhandling.MultiEventHandlerInvoker.handle(MultiEventHandlerInvoker.java:91) at org.axonframework.eventhandling.AbstractEventProcessor.processMessageInUnitOfWork(AbstractEventProcessor.java:195) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$1(AbstractEventProcessor.java:173) at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:57) at org.axonframework.messaging.interceptors.CorrelationDataInterceptor.handle(CorrelationDataInterceptor.java:67) at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55) at org.axonframework.eventhandling.TrackingEventProcessor.lambda$new$1(TrackingEventProcessor.java:181) at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$2(AbstractEventProcessor.java:174) at org.axonframework.tracing.Span.runCallable(Span.java:132) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$3(AbstractEventProcessor.java:170) at org.axonframework.messaging.unitofwork.BatchingUnitOfWork.executeWithResult(BatchingUnitOfWork.java:92) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$processInUnitOfWork$4(AbstractEventProcessor.java:166) at org.axonframework.tracing.Span.runCallable(Span.java:132) at org.axonframework.eventhandling.AbstractEventProcessor.processInUnitOfWork(AbstractEventProcessor.java:165) at org.axonframework.eventhandling.TrackingEventProcessor.processBatch(TrackingEventProcessor.java:491) at org.axonframework.eventhandling.TrackingEventProcessor.processingLoop(TrackingEventProcessor.java:316) at org.axonframework.eventhandling.TrackingEventProcessor$TrackingSegmentWorker.run(TrackingEventProcessor.java:1200) at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.cleanUp(TrackingEventProcessor.java:1402) at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.run(TrackingEventProcessor.java:1379) at java.base/java.lang.Thread.run(Thread.java:840) ```

Processes-Test task 2f402ec0-0861-11ef-8959-0a580a8a36ac not deleted from tasklist. Following error in digiwf-engine-service log:

Log ``` EventListener [RoutingKafkaEventPublisher] failed to handle event [a79cafcd-4201-4194-af1c-e9337fc0e16c] (io.holunda.camunda.taskpool.api.task.TaskCompletedEngineEvent). Continuing processing with next listener org.apache.kafka.common.errors.NetworkException: Disconnected from node 2 Wrapped by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: Disconnected from node 2 at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:97) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:79) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:194) ... 26 common frames omitted Wrapped by: org.axonframework.messaging.EventPublicationFailedException: Event publication failed, exception occurred while waiting for event publication. at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:203) at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.send(KafkaPublisher.java:162) at org.axonframework.extensions.kafka.eventhandling.producer.KafkaEventPublisher.handle(KafkaEventPublisher.java:80) at de.muenchen.oss.digiwf.task.polyflow.kafka.RoutingKafkaEventPublisher.handle(RoutingKafkaEventPublisher.java:31) at org.axonframework.eventhandling.SimpleEventHandlerInvoker.invokeHandlers(SimpleEventHandlerInvoker.java:128) at org.axonframework.eventhandling.SimpleEventHandlerInvoker.handle(SimpleEventHandlerInvoker.java:114) at org.axonframework.eventhandling.MultiEventHandlerInvoker.handle(MultiEventHandlerInvoker.java:91) at org.axonframework.eventhandling.AbstractEventProcessor.processMessageInUnitOfWork(AbstractEventProcessor.java:195) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$1(AbstractEventProcessor.java:173) at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:57) at org.axonframework.messaging.interceptors.CorrelationDataInterceptor.handle(CorrelationDataInterceptor.java:67) at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55) at org.axonframework.eventhandling.TrackingEventProcessor.lambda$new$1(TrackingEventProcessor.java:181) at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$2(AbstractEventProcessor.java:174) at org.axonframework.tracing.Span.runCallable(Span.java:132) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$3(AbstractEventProcessor.java:170) at org.axonframework.messaging.unitofwork.BatchingUnitOfWork.executeWithResult(BatchingUnitOfWork.java:92) at org.axonframework.eventhandling.AbstractEventProcessor.lambda$processInUnitOfWork$4(AbstractEventProcessor.java:166) at org.axonframework.tracing.Span.runCallable(Span.java:132) at org.axonframework.eventhandling.AbstractEventProcessor.processInUnitOfWork(AbstractEventProcessor.java:165) at org.axonframework.eventhandling.TrackingEventProcessor.processBatch(TrackingEventProcessor.java:491) at org.axonframework.eventhandling.TrackingEventProcessor.processingLoop(TrackingEventProcessor.java:316) at org.axonframework.eventhandling.TrackingEventProcessor$TrackingSegmentWorker.run(TrackingEventProcessor.java:1200) at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.cleanUp(TrackingEventProcessor.java:1402) at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.run(TrackingEventProcessor.java:1379) at java.base/java.lang.Thread.run(Thread.java:840) ```

Was probably caused by hotfix rollout

simonhir commented 4 months ago

Tasks remain synchronised. Main problem fixed. Due to unavailability of the Kafka cluster and non-existent retry in Axon-Kafka differences can still occur. This is detected via the existing monitoring and can then be analysed separately if necessary.