winedepot / pinot

Apache Pinot (Incubating) - A realtime distributed OLAP datastore
https://pinot.apache.org
Apache License 2.0
1 stars 0 forks source link

Adding support to ingest Map and MapValueTransform Factory #23

Closed kishoreg closed 5 years ago

kishoreg commented 5 years ago

Supporting Map as a DataType was bit invasive and requires us to first introduce the concept of CompositeColumnDatasource.

While this solution is not elegant, it is a good first step towards supporting Map and Struct data types.

This leverages existing features MultiValue feature in Pinot. A Map is represented as two Multi-valued columns - KEYS and VALUES. This convention is used in only one place - real-time ingestion and segment generation. When pinot sees a FieldSpec name that ends with KEYS or VALUES it checks for field in the input Record. If that object turns out to be a map, then we convert it into two columns -> keyArray and valueArray.

Once we have this, we just need to support a transformUDF map_value(keyColumnName, 'KeyName', valueColumnName) which is equivalent of map['keyName']. We can support the simpler syntax in another PR.

Sample queries

select map_value(myMap__KEYS, 'k1', myMap__VALUES) from myTable

select count(*) from myTable group by map_value(myMap__KEYS, 'k1', myMap__VALUES)

codecov-io commented 5 years ago

Codecov Report

Merging #23 into develop will increase coverage by 0.08%. The diff coverage is 61.33%.

Impacted file tree graph

@@              Coverage Diff              @@
##             develop      #23      +/-   ##
=============================================
+ Coverage      65.14%   65.23%   +0.08%     
  Complexity         4        4              
=============================================
  Files           1049     1050       +1     
  Lines          54420    54491      +71     
  Branches        7761     7776      +15     
=============================================
+ Hits           35453    35548      +95     
+ Misses         16403    16372      -31     
- Partials        2564     2571       +7
Impacted Files Coverage Δ Complexity Δ
...segment/creator/impl/SegmentDictionaryCreator.java 87.75% <ø> (-0.09%) 0 <0> (ø)
.../org/apache/pinot/core/plan/SelectionPlanNode.java 63.15% <100%> (ø) 0 <0> (ø) :arrow_down:
...r/transform/function/TransformFunctionFactory.java 61.22% <100%> (+0.8%) 0 <0> (ø) :arrow_down:
...time/impl/kafka/AvroRecordToPinotRowGenerator.java 40% <4%> (-60%) 0 <0> (ø)
.../transform/function/MapValueTransformFunction.java 86.95% <86.95%> (ø) 0 <0> (?)
...ain/java/org/apache/pinot/core/util/AvroUtils.java 59.31% <92%> (+6.41%) 0 <0> (ø) :arrow_down:
...he/pinot/core/query/pruner/ValidSegmentPruner.java 57.14% <0%> (-28.58%) 0% <0%> (ø)
...a/manager/realtime/RealtimeSegmentDataManager.java 75% <0%> (-25%) 0% <0%> (ø)
...er/validation/BrokerResourceValidationManager.java 25% <0%> (-25%) 0% <0%> (ø)
...pinot/core/operator/docidsets/OrBlockDocIdSet.java 84.9% <0%> (-13.21%) 0% <0%> (ø)
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 30fe389...218782d. Read the comment docs.