Closed igor2x closed 2 months ago
Additional info:
CLUSTER my_chunk USING index_on_time_column;
Conclusions:
Hello @igor2x and thanks for the bug report.
It looks like, to my surprise, data in my chunks are really super "fragmented" (non-clustered) and I need to cluster them, to increase performance.
Chunks are not automatically clustered in any way just because they are chunks. You have to run cluster on them explicitly to get the correlation back. If you do a lot of updates on the table, correlation will go down because the new row versions are added last in the data file, which affects correlation.
However, running cluster and then analyze should get the correlation statistics back.
Statistics are tracked on a chunk level and do not propagate to the hypertable. The stats present on the hypertable are most likely from before turning it in hypertable since the hypertable relation itself stores no tuples and would have no stats. This is similar to how tables in a postgres inheritance tree work.
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Other
What happened?
In most of the cases in our biggest hypertable I expect we have mainly inserts. Using pg_stat_activity I have seen plenty of updates on table and now I was wondering how are data clustered.
Correlation:
I except for my time column to be near 1 in my case it is 0.9991473.
Now I want to see if all of the chunks have similar correlation value.
and I expect to get similar value near 1, but to my surprise I have got: 0.030021765771206122. I know both values can't be exactly the same, because chunks have different number of rows, but I expect to be both values near 1.
To look in to details by chunks, sorting data by correlation column:
then few hundreds chunks and at the end:
I think this is unexpected if hypertable has 0.9991473 value, I expect individual chunks correlations should be near this value (or more correctly average of chunks correlations should be near hypertable correlation value).
TimescaleDB version affected
2.14.2
PostgreSQL version used
15.6
What operating system did you use?
Red Hat 9.3
What installation method did you use?
RPM
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?