it-at-m / digiwf-core

central workflow automation and integration platform based on the free process framework Camunda.
MIT License
19 stars 7 forks source link

Dateninkonsistenzen verhindern Synchronisation von optimize auf digiwf-demo #1348

Closed markostreich closed 8 months ago

markostreich commented 9 months ago

Describe the bug Dateninkonstenzen verhindern Synchronisation von optimize auf digiwf-demo. Der Synch schließt nicht ab. Vermutlich liegt das Problem an Daten-Inkonsistenzen. Auf digiwf-demo loggt der alte engine-rest-service im Nanosekundentakt:

{"timestamp":"2024-02-22T17:21:01.585","appName":"digitalwf-restapi","X-B3-TraceId":"cd17cb0da5a9451f","X-B3-SpanId":"cd17cb0da5a9451f","X-Span-Export":"true","thread":"http-nio-8080-exec-8","level":"ERROR","logger":"org.camunda.bpm.engine.context","location":{"fileName":"BaseLogger.java","line":"215"},"message":"ENGINE-16004 Exception while closing command context: no deployed process definition found with id 'ParkausweisBeantragenV1:9:0ae89c4f-b9ad-11ec-8230-0a580a8a09e7': processDefinition is null"}
{"timestamp":"2024-02-22T17:21:01.675","appName":"digitalwf-restapi","X-B3-TraceId":"07c983cd8632530c","X-B3-SpanId":"07c983cd8632530c","X-Span-Export":"true","thread":"http-nio-8080-exec-10","level":"ERROR","logger":"org.camunda.bpm.engine.context","location":{"fileName":"BaseLogger.java","line":"215"},"message":"ENGINE-16004 Exception while closing command context: no deployed process definition found with id 'cc295f9b-040e-11ed-bb0c-0a580a8a060e': processDefinition is null"}

In der neuen Implementierung ist das Logging nicht aussagekräftig, vermutlich handelt es sich aber um das selbe Problem.

To Reproduce Steps to reproduce the behavior:

  1. Gehe zu https://digiwf-optimize-demo.muenchen.de/#/
  2. Log in
  3. Warte auf die Synchronisation von 'camunda-bpm' links unten.

Expected behavior

Screenshots

grafik.png

Desktop (please complete the following information):

zambrovski commented 9 months ago

Vielleicht sollten wir einfach die Elastic leeren...

dominikhorn93 commented 8 months ago

Sehe ich so wie Simon. Wir haben bei den ganzen Umstellungen viel in den DBs auf den Test und Demosystemen gemacht... Es wäre das einfachste einfach die Elastic zu cleanen. Ich glaube es gibt hier im Optimize Pod sogar ein Script dafür, das man ausführen kann, wenn man über das Terminal in den Pod geht.

dominikhorn93 commented 8 months ago

08:36:36.643 [main] ERROR o.c.o.r.p.ReimportPreparation - Failed preparing Optimize for reimport. org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [optimize-import-index_v3]] at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:178) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2484) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2461) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2184) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:2137) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:2105) at org.elasticsearch.client.IndicesClient.delete(IndicesClient.java:109) at org.camunda.optimize.service.es.OptimizeElasticsearchClient.lambda$deleteIndexByRawIndexNames$1(OptimizeElasticsearchClient.java:357) at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48) at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66) at net.jodah.failsafe.Execution.executeSync(Execution.java:128) at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379) at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68) at org.camunda.optimize.service.es.OptimizeElasticsearchClient.deleteIndexByRawIndexNames(OptimizeElasticsearchClient.java:357) at org.camunda.optimize.service.es.OptimizeElasticsearchClient.deleteIndex(OptimizeElasticsearchClient.java:277) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at org.camunda.optimize.reimport.preparation.ReimportPreparation.deleteImportAndEngineDataIndices(ReimportPreparation.java:115) at org.camunda.optimize.reimport.preparation.ReimportPreparation.performReimport(ReimportPreparation.java:102) at org.camunda.optimize.reimport.preparation.ReimportPreparation.main(ReimportPreparation.java:84) Suppressed: org.elasticsearch.client.ResponseException: method [DELETE], host [https://xxx], URI [/optimize-import-index_v3?master_timeout=30s&timeout=30s], status line [HTTP/1.1 404 Not Found] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [optimize-import-index_v3]","resource.type":"index_or_alias","resource.id":"optimize-import-index_v3","index_uuid":"na","index":"optimize-import-index_v3"}],"type":"index_not_found_exception","reason":"no such index [optimize-import-index_v3]","resource.type":"index_or_alias","resource.id":"optimize-import-index_v3","index_uuid":"na","index":"optimize-import-index_v3"},"status":404} at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:346) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:312) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:287) at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2699) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2171) ... 15 common frames omitted

dominikhorn93 commented 8 months ago
curl -u username:password -kv -X DELETE 'https://digitalwfesk001.srv.muenchen.de:9200/_all'
markostreich commented 8 months ago

optimize loggt nun: org.elasticsearch.client.ResponseException: method [GET], host [https://[...]:[port]], URI [/_cluster/health?master_timeout=30s&level=cluster&timeout=30s], status line [HTTP/1.1 401 Unauthorized]

markostreich commented 8 months ago

Wo finde ich die nötigen Credentials und wo kommen sie hin?

markostreich commented 8 months ago

Passwörter sind wieder gesetzt. Allerdings gibt es nach der Löschaktion von Elastic Search weiterhin Fehler: {"timestamp":"2024-03-18T08:45:30.394","appName":"digitalwf-restapi","X-B3-TraceId":"c262b84a3a6191ca","X-B3-SpanId":"c262b84a3a6191ca","X-Span-Export":"true","thread":"http-nio-8080-exec-10","level":"ERROR","logger":"org.camunda.bpm.engine.context","location":{"fileName":"BaseLogger.java","line":"215"},"message":"ENGINE-16004 Exception while closing command context: no deployed process definition found with id 'FeatureMailTemplate:11:f72d8805-5233-11ee-bbf0-0a580a8a570e': processDefinition is null"}

markostreich commented 8 months ago

Mögliche Lösung:

https://gist.github.com/howkymike/e6678117083e6021a112ba4b9897da73

markostreich commented 8 months ago

Auch versucht: Re-Import gestartet.

https://docs.camunda.io/optimize/self-managed/optimize-deployment/reimport/

markostreich commented 8 months ago

Danach Fehler:

17:58:28.039 [EngineImportScheduler-1] ERROR o.c.o.s.i.e.m.CompletedActivityInstanceEngineImportMediator - Was not able to import next page, retrying after sleeping for 5063ms. org.camunda.optimize.service.exceptions.OptimizeProcessDefinitionNotFoundException: Wasn't able to retrieve process definition with id [FeatureMailTemplate:11:f72d8805-5233-11ee-bbf0-0a580a8a570e] from the engine. It's likely that the definition has been deleted but the historic data for it is still available. Please make sure that there are no remnants of historic process instances for that definition left! Response from the engine: {"type":"InvalidRequestException","message":"No matching definition with id FeatureMailTemplate:11:f72d8805-5233-11ee-bbf0-0a580a8a570e","code":0}

markostreich commented 8 months ago

Clean-Up-Skript zum Entfernen von historischen Prozessinstanzen deren Prozessdefinitionen bereits gelöscht wurden.

-- get historic process instances wich have no deployed process definition
select * from ACT_HI_PROCINST ahpi
left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ 
where arpd.id_ is null;

-- get entries to clean in camunda tables
select * from ACT_HI_ACTINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from ACT_HI_ATTACHMENT where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from ACT_HI_COMMENT where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from ACT_HI_DETAIL where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from ACT_HI_IDENTITYLINK where ROOT_PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from ACT_HI_VARINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from ACT_HI_TASKINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from ACT_HI_PROCINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
-- get entries to clean in digiwf polyflow tables
select * from dwf_task_info where instanceid_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from dwf_process_instance_auth where processinstanceid_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);
select * from dwf_process_instance_info where processinstanceid_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_  where arpd.id_ is null);

/*
-- clean camunda tables
DELETE from ACT_HI_ACTINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_ATTACHMENT where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_COMMENT where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_DETAIL where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_IDENTITYLINK where ROOT_PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_VARINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_TASKINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_PROCINST where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
DELETE from ACT_HI_OP_LOG where PROC_INST_ID_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
-- clean digiwf polyflow tables
delete from dwf_task_info where instanceid_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
delete from dwf_process_instance_auth where processinstanceid_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
delete from dwf_process_instance_info where processinstanceid_ in (select PROC_INST_ID_ from ACT_HI_PROCINST ahpi left join act_re_procdef arpd on arpd.id_ = ahpi.proc_def_id_ where arpd.id_ is null);
commit;
*/
markostreich commented 8 months ago

@simonhir Kannst du obiges Skript einmal prüfen, ob man das so machen könnte?

markostreich commented 8 months ago

Daten sind durch das Skript wieder konsistent. Ticket wird geschlossen.