Many threat intel sources are not static and are modified/updated. If we are polling for data based on time, this will introduce duplicates.
Goal
Add ability to deduplicate data ingested from enrichment sources.
Notes
Can be implemented with Athena V3 Iceberg MERGE INTO
For enrichment table, have a temp table: table_temp (need to create this table statically)
On new data pulled, overwrite temp table with new data. (puller writes to temp table).
Inside metadata writer, Execute an Athena query that merges new data from temp table to main table. Query like:
MERGE INTO enrichment_table main USING enrichment_table_temp new
-- primary key
ON (main.event.id = new.event.id)
WHEN MATCHED
-- all top level cols
THEN UPDATE SET event = new.event, threat = new.threat
WHEN NOT MATCHED
-- all top level cols
THEN INSERT (event, threat) VALUES(new.event, new.threat)
Overview
Many threat intel sources are not static and are modified/updated. If we are polling for data based on time, this will introduce duplicates.
Goal
Add ability to deduplicate data ingested from enrichment sources.
Notes
Can be implemented with Athena V3 Iceberg
MERGE INTO
table_temp
(need to create this table statically)