nickchomey / conduit-connector-surrealdb

2 stars 2 forks source link

Feature: Add support for relationship schema to be passed in at runtime #2

Open nickchomey opened 2 months ago

nickchomey commented 2 months ago

Feature description

Right now this connector has no knowledge of relationships - it simply receives openCDC records and does CRUD on SurrealDB.

It would be good to be able to automatically

  1. Embed records from one table in another
    • e.g. if a record comes in from the post table and has a field post.user_id = 123, it should automatically change 123 to user:123 so that the user record gets embedded in the post record.
    • Likewise when the table and fields are created, if a relationship schema is available then it should define the TYPE for the field DEFINE FIELD user ON TABLE post TYPE record<user>;
  2. Create bidirectional relationships between records
    • Eg run RELATE user:123 -> wrote -> post:5678

Update from later

Should almost never use embedded records

  1. not bidirectional
  2. dont clean up after themselves - if you embed a record then delete it, the record you embedded it into still thinks its there. Relate table will remove the relation when one side is deleted.

I wrote more here: https://github.com/surrealdb/docs.surrealdb.com/issues/920


To do this, we will need to be able to define and import relationship schemas at startup, and ideally be able to update them while the server is running.

It would surely be best to do this outside of the Conduit configuration files and/or HTTP API.

Ideas:

Schema Settings:

nickchomey commented 1 month ago

There's a variety of ways to create the relations upon snapshot

RELATE $person->order->$product TIMEOUT 3s;



~~Seems like the 2nd bulk insert option might work for the snapshot- especially since there's no RPC method for bulk insert_relation. Defining events before would surely cause a huge drag by creating relations one-by-one while bulk inserting. Doing the select then relate would only really work if was done at the very end of the snapshot, but then has a huge amount of rows to deal with.~~

But can then define the events after the initial bulk snapshot, to serve essentially as automatic Foreign Keys in an sql table during CDC. When doing realtime sync, relationships will get created automatically - its surely the only reasonable way to do it. 

Yet, the question remains: how do you specify those relations? 

The conduit folks suggested that the relationship schema be defined/embedded in each opencdc record by a pipeline processor, and this connector simply have a handler to insert relations if the record has particular relationship fields. (The Neo4j connector simply requires records to have some [particular fields](https://github.com/conduitio-labs/conduit-connector-neo4j?tab=readme-ov-file#relationship-creation-handling) so that they can create relations)

I suppose it could work for the post-snapshot batch relation thing... 

Or we could have a JSON/YAML file read at runtime that defines the relations. And would avoid the need to do anything with modifying records with a pipeline processor. But it wouldnt be useful if there's multiple destination connectors in the pipeline - better to have a single processor do it all?

Could define it all via an embedded NATS client, allowing updates etc. But that's not something that other users could take advantage of.
nickchomey commented 1 month ago

In lieu of a batch mechanism in the RPC insert_relation command command, surely I can just unmarshal the batch relations struct/map into a string and submit it with the RPC query command and pass a query, vars pair that contains the actual SurrealQL INSERT RELATION INTO batch relation query.

Though, will need to be careful with Sql injection. The vars parameter is apparently sanitized automatically. But not sure how easy/possible it would be to use this with a large batch query.

https://surrealdb.com/docs/surrealdb/reference-guide/security-best-practices#query-safety