Open sdruzkin opened 3 months ago
Calculating the exact number of elements before merging to create the block builders of the exact expected size is also expensive due to complex data types and different block encodings.
Can you help me understand why this is? Isn't this just peaking at the metadata of the underlying blocks--what makes it expensive?
Calculating the exact number of elements before merging to create the block builders of the exact expected size is also expensive due to complex data types and different block encodings.
Can you help me understand why this is? Isn't this just peaking at the metadata of the underlying blocks--what makes it expensive?
There are two factors that contribute to the complexity:
Another use-case for the chunked encoded blocks is application of deltas, when we have a base block and a bunch of delete row or update cell deltas merged together with the base block after loading.
There are also a few more alternatives to the chunked encoding that I can think of:
In presto-orc and nimble we have a use case when we need to merge/flatten a bunch of blocks into a single map aka constructing a flat map. Flat map is an encoding kind for maps when we store value streams for each key independently instead of having one big value stream for all keys. Flat maps usually have a complex value type like
array<int>
orarray<long>
.Here is the code doing flat map merging https://github.com/prestodb/presto/blob/master/presto-orc/src/main/java/com/facebook/presto/orc/reader/MapFlatSelectiveStreamReader.java#L445
We use a regular map block builder for merging. One of the problems is that the leaf level block builders need to copy the leaf value arrays multiple times when they grow, and it's quite expensive operation when they grow to 100k-1m+ elements. Ideally, to avoid merging we can have a dedicated block and type for flat maps or present the flat map block as a struct, but it will require quite a lot of changes. Calculating the exact number of elements before merging to create the block builders of the exact expected size is also expensive due to complex data types and different block encodings.
To mitigate the problem with expensive resizing in the block builder I'm thinking of creating a chunked implementations of the Int and Long blocks and their builders.
Something like this, which is pretty similar to Apache Arrow chunked array:
The construction will be cheaper, but read access will be a bit more expensive. Behavior wise it would be identical to the current implementations and should be compatible with the existing block encoders. Unfortunately the block interface is a bit complicated, so it will need at least a few days spent on each type.
Do you guys have any opinion on this approach or creation of new block types?