When a sync is triggered, data is fetched from the start for each provider, which can be time-consuming and results in retrieving data that we already possess.
Also, several providers require a pagination to pull all the data from the source to Panora db.
Possible Solution:
Currently, we have a connection associated with every provider. So, we could preserve the state of every object of every provider's connection by storing them in a separate table.
Each provider uses a different pagination method, so to address this and manage the state of the last sync data, I’ve created the following table. It might be improved but it represents the general idea.
-- ************************************** vertical_objects_sync_track_data
CREATE TABLE vertical_objects_sync_track_data
(
id_vertical_objects_sync_track_data uuid NOT NULL,
vertical text NOT NULL,
provider_slug text NOT NULL,
object text NOT NULL,
pagination_type text NOT NULL,
id_connection uuid NOT NULL,
data json,
CONSTRAINT PK_vertical_objects_sync_track_data PRIMARY KEY ( id_vertical_objects_sync_track_data ),
CONSTRAINT FK_connection FOREIGN KEY ( id_connection ) REFERENCES connections ( id_connection )
);
By implementing this logic, we could preserve the sync state for each object, and storing the generic type of pagination data could help to handle any pagination.
Note:
I have implemented this logic for attio as an example to give you the idea.
Problem:
Possible Solution:
Note: