octue / octue-sdk-python

The python SDK for @Octue services and digital twins.
https://octue.com
Other
9 stars 3 forks source link

Optimise clustering of event store #642

Open cortadocodes opened 3 months ago

cortadocodes commented 3 months ago

Feature request

Use Case

We need to decide which fields to cluster on in the BigQuery event store and whether to pull the event kind out as a column.

Current state

The event kind is stored in the event JSON field and is queryable but cannot be ordered by (I don't think we need to order by it). We're currently clustering on ["sender", "question_uuid"] in that order. Clustering is order-dependent on the filtered fields and must include the fields of higher priority (to the left) of a clustered field to take advantage of the clustering.

@thclark says: "We’d need to cluster on event_kind otherwise you’d have to process (for example) all the log rows every time you want to query for input or output values (remember it’s column based storage so the filters aren’t like conventional SQL, it’ll process all rows in order to apply a filter). Also, regardless of clustering I think (??) it may be more efficient to filter directly on a column than on a JSONField."

Proposed Solution

Discuss and choose: