Open nickchomey opened 2 months ago
There's a variety of ways to create the relations upon snapshot
INSERT RELATION
like we did for the initial bulk insert (doc here), and this doc)
LET $person = (SELECT VALUE id FROM person);
LET $product = (SELECT VALUE id FROM product);
RELATE $person->order->$product TIMEOUT 3s;
~~Seems like the 2nd bulk insert option might work for the snapshot- especially since there's no RPC method for bulk insert_relation. Defining events before would surely cause a huge drag by creating relations one-by-one while bulk inserting. Doing the select then relate would only really work if was done at the very end of the snapshot, but then has a huge amount of rows to deal with.~~
But can then define the events after the initial bulk snapshot, to serve essentially as automatic Foreign Keys in an sql table during CDC. When doing realtime sync, relationships will get created automatically - its surely the only reasonable way to do it.
Yet, the question remains: how do you specify those relations?
The conduit folks suggested that the relationship schema be defined/embedded in each opencdc record by a pipeline processor, and this connector simply have a handler to insert relations if the record has particular relationship fields. (The Neo4j connector simply requires records to have some [particular fields](https://github.com/conduitio-labs/conduit-connector-neo4j?tab=readme-ov-file#relationship-creation-handling) so that they can create relations)
I suppose it could work for the post-snapshot batch relation thing...
Or we could have a JSON/YAML file read at runtime that defines the relations. And would avoid the need to do anything with modifying records with a pipeline processor. But it wouldnt be useful if there's multiple destination connectors in the pipeline - better to have a single processor do it all?
Could define it all via an embedded NATS client, allowing updates etc. But that's not something that other users could take advantage of.
In lieu of a batch mechanism in the RPC insert_relation command command, surely I can just unmarshal the batch relations struct/map into a string and submit it with the RPC query command and pass a query, vars pair that contains the actual SurrealQL INSERT RELATION INTO
batch relation query.
Though, will need to be careful with Sql injection. The vars parameter is apparently sanitized automatically. But not sure how easy/possible it would be to use this with a large batch query.
https://surrealdb.com/docs/surrealdb/reference-guide/security-best-practices#query-safety
Feature description
Right now this connector has no knowledge of relationships - it simply receives openCDC records and does CRUD on SurrealDB.
It would be good to be able to automatically
post
table and has a fieldpost.user_id = 123
, it should automatically change123
touser:123
so that the user record gets embedded in the post record.DEFINE FIELD user ON TABLE post TYPE record<user>;
RELATE user:123 -> wrote -> post:5678
Update from later
Should almost never use embedded records
I wrote more here: https://github.com/surrealdb/docs.surrealdb.com/issues/920
To do this, we will need to be able to define and import relationship schemas at startup, and ideally be able to update them while the server is running.
It would surely be best to do this outside of the Conduit configuration files and/or HTTP API.
Ideas:
Schema Settings:
DEFINE EVENT
for all relationships, such that surreal automatically handles their creation each time a record comes in?DEFINE INDEX
- there's various types - unique, composite, fulltext, vector etc...