Currently the performance of the delete operation is quite a bottleneck in the processing operation. The cause of this is that in order to perform the DELETE every partition that matches the WHERE clause of the delete needs to be rewritten with the deleted rows.
We need to do this DELETE operation so that we can clear the relevant database tables from older observations before regenerating them.
Since the filtering on the where can potentially be quite quite large and it might be touching a lot of quite big partitions, this means we have to have a lot of extra head room in terms of storage requirements.
I was doing some research into this and one approach could be that of defining a custom partition key and then either doing a delete for the whole partition (which is probably not feasible as we would have to create way more partitions that necessary) or in any case at least narrowing the modifications to a single partition.
As part of this change I am planning to also make some important schema changes to the observation tables to set us up to be able to process daily in hourly batches instead of daily.
My suggestions is therefore to do the following:
Change the bucket_date column to be a datetime object which will allow us to use a bucket_date of hourly granularity
Change the partition key of the obs_web (and other relevant tables) to be toYYYYMM(bucket_ts) so that we should be able to only hit a single, hopefully smaller, partition when performing a heavy weight delete operation
Before doing these changes it's probably worth spending some time to better understand if this will indeed lead to better performance.
Another approach could be that of looking if it's possible to make use of some of the other merge tree table family or structuring our data in such a way where we can get it to be eventually consistent. This might however be tricky as it would be significantly changing the downstream data consuming queries to be aware of this eventual consistent pattern (eg. looking at the sign column in any aggregation).
Currently the performance of the delete operation is quite a bottleneck in the processing operation. The cause of this is that in order to perform the DELETE every partition that matches the WHERE clause of the delete needs to be rewritten with the deleted rows.
We need to do this DELETE operation so that we can clear the relevant database tables from older observations before regenerating them.
Since the filtering on the where can potentially be quite quite large and it might be touching a lot of quite big partitions, this means we have to have a lot of extra head room in terms of storage requirements.
I was doing some research into this and one approach could be that of defining a custom partition key and then either doing a delete for the whole partition (which is probably not feasible as we would have to create way more partitions that necessary) or in any case at least narrowing the modifications to a single partition.
As part of this change I am planning to also make some important schema changes to the observation tables to set us up to be able to process daily in hourly batches instead of daily.
My suggestions is therefore to do the following:
bucket_date
column to be a datetime object which will allow us to use a bucket_date of hourly granularitytoYYYYMM(bucket_ts)
so that we should be able to only hit a single, hopefully smaller, partition when performing a heavy weight delete operationBefore doing these changes it's probably worth spending some time to better understand if this will indeed lead to better performance.
Another approach could be that of looking if it's possible to make use of some of the other merge tree table family or structuring our data in such a way where we can get it to be eventually consistent. This might however be tricky as it would be significantly changing the downstream data consuming queries to be aware of this eventual consistent pattern (eg. looking at the sign column in any aggregation).