netspective-labs / sql-aide

SQL Aide (SQLa) Typescript template literal text supplier optimized for emitting SQL
https://www.sql-aide.com/
MIT License
17 stars 17 forks source link

EmailEngine with Data Vault for SQL access #153

Open arunqa opened 1 year ago

arunqa commented 1 year ago

Now, we need to do the following:

  1. Use SQLa to create the Data Vault infrastructure as I’ve shown below – we will build views in SQL to get access to the email JSON data
  2. Create nodeJS or other EE SDK-based app which reads all emails via IMAP and stores the content in the DataVault
  3. Setup webhooks in EE so that incoming emails push to the Data Vault.

Setting up the Data Vault tables:

-- Index for the hub

CREATE INDEX idx_hub_email_hash ON hub_email(email_hash);

-- Indexes for the satellite

CREATE INDEX idx_sat_email_detail_hash ON sat_email_detail(email_hash);
CREATE INDEX idx_sat_email_detail_load_datetime ON sat_email_detail(load_datetime);
CREATE INDEX idx_sat_email_detail_data_gin ON sat_email_detail USING GIN(email_data);

Create as many views as we need for accessing the emails in properly structured format. Talk with Raphael and team for how they do FHIR mapping for complex JSON from unstructured text.

Example for how to insert data – this should be a stored procedure

Assuming we have variables:

-- First, compute the email_hash

DO $$DECLARE email_hash_val BYTEA;
BEGIN
    email_hash_val := digest($1::TEXT, 'sha256');

    -- Try to insert into hub
    INSERT INTO hub_email(email_hash, load_datetime, record_source)
    VALUES (email_hash_val, $3, $2)
    ON CONFLICT (email_hash) DO NOTHING; -- prevent duplicates

    -- Insert into satellite
    INSERT INTO sat_email_detail(email_hash, load_datetime, email_data, record_source)
    VALUES (email_hash_val, $3, $1, $2);
END$$;
alan-francis commented 1 year ago

@shah : In the current structure of SQLaide data vaults, we are following the below format:

const syntheticHub1 = dvg.hubTable("synthethic1", {
    hub_synthethic1_id: primaryKey(),
    h1_bkey_int: integer(),
    h1_bkey_text: text(),
    h1_bkey_date: date(),
    h1_bkey_int_nullable: integerNullable(),
    h1_bkey_text_nullable: textNullable(),
    ...dvg.housekeeping.columns,
  });

Here we are using the table_name_id as primary key and other required columns with the housekeeping as well. As per the reference given above, we need to create hub table with a different type of primary key with a different name (email_hash BYTEA PRIMARY KEY). Can you confirm?