keeleinstituut / tv-tolkevarav

Tõlkevärav (Translation Hub)
1 stars 0 forks source link

Caching data objects to increase data consistency between services #220

Open MariusJulius opened 1 year ago

MariusJulius commented 1 year ago

Lets name two services:

The idea is that consumer service caches the resource from data source service in a materialized view in the consumer service db

The schema of the materialized view for the cached resource can follow the same schema as in the data source service, but is not a necessity. Cached objects will be used as read-only in business logic, so it can also be in some different schema which simplifies working with this resource objects on the consumer service side. So in that sense there's no need a strict schema for the cached resource, it would be the same as doing HTTP requests from consumer service to data source service to retrieve the resource instead of using the cached resource.

Consumer service holds the logic to synchronize the materialized view with the resource in data source service. Basically it contains 2 things - querying the resource from data source service and updating the materialized view with new data. The querying logic should definitely and also initially support full synchronization, which can be used to trigger full synchronization with the data source service. This is useful if there are some malformed data in materialized view that's not up-to-date with data source and its easier to sync it as whole. And then we can optimize this to support gradual synchronization

When consumer service first starts it needs to populate the materialized view, which is basically a full sync. So there needs to be initial check when service starts that triggers the data synchronization.

When resource is modified on data source service (rows added, deleted, updated), then data source service publishes message to amqp exchange with fan-out type. Fan-out exchange type propagates the message to every queue that is binded to this exchange (normal rabbitmq exchange only sends the message to one queue). This ensures that every service that is relying on the resource in data source service is aware of the data changes. This message will be catched on consumer service side which triggers the data synchronization logic inside consumer service

thenouan commented 1 year ago

@kadmit I moved this into in progress since I understood you started working on it