Add bitemporal data to each Guac Node

guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.

https://guac.sh

Apache License 2.0

1.29k stars 176 forks source link

Add bitemporal data to each Guac Node #171

Open pxp928 opened 2 years ago

pxp928 commented 2 years ago

Add bi-temporal data to each node within the assembler. This should be done via the objectMetadata that is already part of the nodes.

Note: Refactoring needs to be done on NewObjectMetadata such that they are out of the parser as unit testing with timestamps will be difficult.

jchestershopify commented 2 years ago

I don't think this captures enough temporal history, as any modification past the first means previous moments are lost (whether like tears in rain is TBD).

I talked about this in my UAG article.

[note for future readers: this comment was responding to the initial version of the issue, which envisaged using created-at and modified-at flags for temporality]

pxp928 commented 2 years ago

Thats a great point @jchestershopify. We have had a brief discussion on this a while back but definitely warrants further discussion and how it should be implementation. Changed the title and issue to capture this.

mihaimaruseac commented 2 years ago

We could store multiple date fields in nodes as well as multiple edges between same pair of nodes to make sure data stays the same. Needs profiling to make sure we won't make things too slow though

lumjjb commented 2 years ago

yea and i believe that the data model should capture this as well. I think there will be a split between immutable nodes which will have fixed temporal information and mutable structures which would encode more complex temporal information - or perhaps be normalized so that each node will be one event (or having a separate way to manage that - i think it will highly depend on the query / policy patterns).

jchestershopify commented 2 years ago

The tricky part is that you need a fast way to cut across the temporal axes, particularly transaction time ("What did the database think was true at point X?", where X is usually "now").

It might be worth consulting with Neo4j folks to see if this has come before and whether they have best practices or guidance on how to model it.

In SQL you used to pay the cost at query time in the form of extra where clauses, because you needed to specify each of the two axes. From SQL:2011 onwards there was inbuilt support that made it less painful. Without inbuilt support we might be looking at a pre-2011 scenario, where queries are made harder (assuming you don't build a proxy that rewrites for the most common case, which is "now").

mlieberman85 commented 2 years ago

Yeah, from my understanding some of the other databases that natively support bitemporal data pretty much have a way of capturing snapshots (or diffs, unclear from a lot of the docs) as stuff changes. Those databases are also built with it in mind, creating bitemporal indexes.