pucardotorg / dristi

MIT License
2 stars 12 forks source link

[BUG]: Data in ES indexes are faulty and not consistent with RDBMS #1717

Open subhashini-egov opened 1 month ago

subhashini-egov commented 1 month ago

Describe the bug ElasticSearch indexes will be out of sync with RDBMS due to usage of multiple Kafka topics. Not all of them are being updated in the Indexer.

  1. Reindexing behavior has to be implemented properly. Reindexing has to be time bound and implemented with a base ES backup in place. Design document is here.
  2. There are three pipelines that write data to ES indices:
    • Service->Indexer->ES
    • Service->Transformer->Indexer->ES
    • Service->Analytics->Indexer->ES

All three are used in some combination and they all overwrite the same index. This is problematic as changes to the DPG indices might cause problems with the solutions layer etc..Separation of indices for dashboarding vs. raw indices as emitted by the services needs to be in place.

  1. A listing of all topics, publishers, consumers and indices is available here. Some topics are being ignored and not used to update Indexes. This can lead to inconsistent data and therefore, problematic reports and dashboards.

Expected behavior

Whenever RDBMS data changes, it should be emitted to a relevant topic; This should then be used to update all the relevant ES indices - wherever the service data is being used.

CC: @atulgupta2024 @devarajd94 @suresh12

shashikesh12 commented 3 days ago

@subhashini-egov @manimaarans as part of this fix,

Beehyv-Vinod commented 10 hours ago

Hi @Ramu-kandimalla @rajeshcherukumalli, Verified the cases transactions, all are working fine in dpg-dev environment. Verified the transaction logs using the below command --kubectl logs svc/transformer -n egov | grep KL-000294-2024

Below are verified transactions,

  1. E-Filing
  2. Join a case
  3. Hearings
  4. Orders
  5. Submissions
  6. Tasks