Open YarekTyshchenko opened 1 year ago
After a chat in Slack, I was given a working function that runs the aggregations manually by looping over a period. As expected the performance was maintained, as seen in this graph (on the right):
Left is the timescale's attempt to update the aggregations via a policy. On the right is the manual day by day. after a small bump I switched it to using 4 week periods, which increased throughput and dropped the iops.
Its strange that the aggregation policies don't run in a similar fashion, I would have expected them to re-aggregate each chunk separately, given that this would be a great place to add support for updating compressed aggregated chunks (decompress, re-aggregate, then re-compress)
Hi @YarekTyshchenko , Thanks for the update.
Please let me know if you have any additional question on this.
@pdipesh02 I do not, I think the problem is well documented here, and looks like its also well known from the conversation I had on slack. Is there anything more I can provide to help you fix this?
@YarekTyshchenko
I was given a working function that runs the aggregations manually by looping over a period
Would you be able to share this function? Thanks
What type of bug is this?
Performance issue
What subsystems and features are affected?
Continuous aggregate
What happened?
Creating the aggregate from large amount of data takes a very long time (will never complete). I'm attaching a screenshot of metrics that should make the problem plain to see. The aggregation writes get slower as they run.
Test case SQL attached to the ticket.
You may have to adjust the number of sensors for smaller test dataset
Its possible to manually remove and add a policy continuously increasing the start interval, and in that case the write speed is maintained (as long as the period is ~10 chunks), but that's too manual over the amount of data I'm working with.
I also tried to manually trigger cagg via the
refresh_continuous_aggregate
however, when used in a loop, it complains about a missing relation.Deployment details:
Deployed in Kubernetes via the official helm chart, with an Azure Managed
TimescaleDB version affected
2.9.1, 2.10.3
PostgreSQL version used
14.8
What operating system did you use?
Helm chart, docker image version: timescale/timescaledb-ha:pg14.8-ts2.10.3
What installation method did you use?
Docker, Other
What platform did you run on?
Microsoft Azure Cloud
Relevant log output and stack trace
Nothing interesting in the logs for the time period, except maybe this:
If there's any more detailed logging that I can enable, please let me know.
How can we reproduce the bug?