metatron-app / metatron-discovery

Powerful & Easy way for big data discovery
https://metatron.app
Apache License 2.0
442 stars 112 forks source link

Supports 2 types of CountD Aggregator #3288

Open ufoscw opened 4 years ago

ufoscw commented 4 years ago

Is your feature request related to a problem? Please describe. Supports 2 types of countd Aggregator (include ifcountd) The existing Aggregator type should be changed to support two types: "thetaSketch" and "cardinality" according to parameters.

Describe the solution you'd like

  1. cardinality expression : counts([fieldName]) "aggregations": [     {       "type": "cardinality",       "name": "aggregationfunc_000",       "fieldNames": [         "hashed_user_id"       ],       "byRow": true     }   ],   "postAggregations": [     {       "type": "math",       "name": "MEASURE_2",       "expression": "ROUND(aggregationfunc_000)",       "finalize": true     }   ],

  2. thetaSketch expression : countd([fieldName], size) "aggregations": [     {       "type": "thetaSketch",       "name": "MEASURE_2",       "fieldName": "hashed_user_id",       "size": 20000,       "shouldFinalize": true     }   ],

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

kyungtaak commented 4 years ago

@ufoscw Originally, the "countd" function has converted to a cardinality aggregation. However, due to performance and accuracy issues, I changed to using thetaSketch type(#967). So, I am wondering why do you need 2 types?

i1befree commented 4 years ago

In druid, thetaSketch has optional parameter (size) for accuracy issues. "countd" funcion does not have optional parameter now and we need to change the type of countd. We also found cardinality aggregation is faster than thetaSchetch from @navis .