Open sanikolaev opened 7 months ago
seems it designed this way and mentioned at the documentation COUNT(DISTINCT-field)
COUNT(DISTINCT) against a distributed table or a real-time table consisting of multiple disk chunks may return inaccurate results, but the result should be accurate for a distributed table consisting of local plain or real-time tables with the same schema
COUNT(DISTINCT)
groupers has two modes:
I am sure there is no easy fix to perform merge of the multiple groupers/sorters with different schema.
I tried the new sorter implementation for the only select count(distinct g) from idx1, idx2
that might work if grouper/sorter has not keep any attributes from the index match however it has multiple constraints:
count(distinct )
and this seems not practical.
Maybe the better option is to extract internal distinct values structure into result set that could be reused on result sets merge or transfered from agents into master and used at the master to merge result sets from agents and still get accurate COUNT(DISTINCT)
of that merged result set. However that also seems a large change.
MRE:
Version:
on
dev2
Checklist
To be completed by the assignee. Check off tasks that have been completed or are not applicable.