opendatadiscovery / odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
https://opendatadiscovery.org
Apache License 2.0
1.16k stars 96 forks source link

Sync in the Data Entitties is not happening after the cross namespace recovery #1682

Open mavenzer opened 1 month ago

mavenzer commented 1 month ago

To test the resilience of our deployment patterns we have deployed ODD in many namespaces in Kube Cluster. So to validate the recovery from one namespace to another (say from Dev to Production) we have used cronjob as the backup mechanism. Since we are using Bitnami Postgres as the database its pretty easy to write the cronjob from the database and test it out.

PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U $POSTGRES_USER -h $POSTGRES_HOST -p $POSTGRES_PORT > /backups/all-backups-$TIMESTAMP.sql

We can easily do the recovery in the same namespace but the problem starts with the different namesapce(From DEV to PROD) i.e. we wanted to recover the data from the dev namespace to prod namespace using the .sql file from the dev namespace.

Steps which we have followed for the recovery :

So what we have found is exact 2x number of data entities in the ODD platform, because there are two copies of data entities one of the DEV NS and one for the PROD ns. And the data is there for the dev ns not for the Prod ns.

data-prod data-dev

Vladysl commented 1 month ago

Hi, ODDRN is unique identifier for data_entity, data_source, dataset_field (More info) So, in that case, updating the data_soruce table won't be enough; you also need to update these tables:

During the next collector ingest, vectors for these data entities will be updated in _public.searchentrypoint (P.S. In case some entities will not be part of the ingest, facet search for them could not be working.)

NOTE: You need to perform all these changes before Collector launches.