ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.87k stars 536 forks source link

Gather CountMinSketch statistics from Column tables #5867

Closed ildar-khisambeev closed 5 days ago

ildar-khisambeev commented 3 months ago

Steps: 1) DONE use the same TEvStatisticsRequest / TEvStatisticsResponse interface for datashard / columnshard https://github.com/ydb-platform/ydb/pull/5820 2) implement Count-Min Sketch with already given statistics interface https://github.com/ydb-platform/ydb/blob/main/ydb/core/tx/columnshard/engines/scheme/statistics/abstract 3) statistics merge logic on granule level 4) fill pre-counted statistics on TEvStatisticsRequest handling

ildar-khisambeev commented 3 months ago

https://wiki.yandex-team.ru/users/nsofya/ydb/statistics/ https://wiki.yandex-team.ru/kikimr/design/datashard/statistics/columnshard-traversal/

ildar-khisambeev commented 3 months ago

Describe a case, when setting up statistics gathering for a column already containing data

ildar-khisambeev commented 5 days ago

Done in https://github.com/ydb-platform/ydb/pull/7978