Closed wenleix closed 2 years ago
For array_agg
, one idea is store one huge flattened ArrayBlockBuilder
for all groups, and we don't have per group ArrayBlockBuilder
. We need to have a linked list for each group to construct the aggregated array per group. This is similar to the histogram aggregation improvement in https://github.com/prestodb/presto/pull/8918.
This idea should also be applicable to the following aggregation functions: map_agg
, map_unoin
and multimap_agg
Similarly, for ARBITRARY
, one idea is to store a huge flattened ArrayBlockBuilder
for all groups. Another integer array needs to be maintained for the mapping from group id to the position in the ArrayBlockBuilder
. Since for ARBITRARY
, element only doesn't need to be updated per group, we don't need to handle update case.
While MIN_BY
and MAX_BY
can also use this method to store the flattened array for all groups, they need to handle updates for group. One way would be to always append to the huge flattened ArrayBlockBuilder
and update the mapping from group id to position. However, this requires periodic compaction when the ArrayBlockBuilder
wastes too much memory.
Per discussion with @highker and @haozhun . There is an easier solution for ARBITRARY
, MIN_BY
, MAX_BY
.
The issue is currently we use BlockState
to store the aggregation state, which stores a Block
for a structural type. The SingleMapBlock
are crated via type.getObject(block, position)
Instead of slice the large block into SingleMapBlock
, we should store the parent block and the position into BlockState
. The same applies to Slice
as well.
In a long term, we might want to be able to perform automatic compaction.
Update: We have been able to reproduce the issue by a query contains a lot of array_agg
and billions of groups. The symptom is we see excessive native memory usage (up to 20G). We find this is due to Remember Set with Native Memory Tracking enabled:
[0x00007f6340c1ea05] ArrayAllocator<unsigned long, (MemoryType)7>::allocate(unsigned long)+0x175
[0x00007f6340c1d2eb] BitMap::resize(unsigned long, bool)+0x6b
[0x00007f6340f183aa] OtherRegionsTable::add_reference(void*, int)+0x3ea
[0x00007f63411f1416] ObjArrayKlass::oop_oop_iterate_nv_m(oopDesc*, FilterOutOfRegionClosure*, MemRegion)+0x186
(malloc=7045856KB +7045856KB #880732 +880732)
[0x00007f6340c1ea05] ArrayAllocator<unsigned long, (MemoryType)7>::allocate(unsigned long)+0x175
[0x00007f6340c1d2eb] BitMap::resize(unsigned long, bool)+0x6b
[0x00007f6340f183aa] OtherRegionsTable::add_reference(void*, int)+0x3ea
[0x00007f6340f357ff] InstanceKlass::oop_oop_iterate_nv(oopDesc*, FilterOutOfRegionClosure*)+0x12f
(malloc=10433488KB +10433488KB #1304186 +1304186)
@nezihyigitbasi and @highker helped to find mitigations by tuning G1RSetSparseRegionEntries
, as illustrated in https://medium.com/@milan.mimica/everybody-leaks-f210631f13ef . It can lower the native memory usage down to around 10G.
This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.
This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.
We have seen the following two issues in production when performing aggregations with structural type with huge number of groups. There are two separate cases:
We see a lot of
SingleMapBlock
when the aggregation function is likeMAX_BY(map, x)
orARBITRARY(map)
, since anSingleMapBlock
will be used for each group state.This issue can also happen for
SingleRowBlock
(for row type) orXArrayBlock
(for array type). We have also seen a massive number ofSlice
in heap forMAX(varchar)
.We see a lot of
MapBlockBuilder
forARRAY_AGG(map)
.In general, this doesn't restricts to map type, since we could see a lot of
BlockBuilder
forARRAY_AGG(x)
on any type, as long as the aggregation produces many groups.