Open lmatz opened 9 months ago
Need to investigate the reason. Please help to reproduce and dump a heap ❤️
I can reproduce on a 4C 8GB machine with the following query:
CREATE TABLE s(
dummy int, -- to control the number of rows generated in total
i int,
c varchar
) with (
connector = 'datagen',
fields.dummy.kind = 'sequence',
fields.dummy.start = '1',
fields.dummy.end = '1000000',
fields.i.kind = 'random',
fields.i.min = 1,
fields.i.max = 10,
fields.i.seed = 1,
fields.c.kind = 'random',
fields.c.length = 10000,
fields.c.seed = 1,
datagen.rows.per.second = '1000000',
datagen.split.num = '4'
);
CREATE MATERIALIZED VIEW len as
select array_length(array_agg(dummy)) as dummy_len,
array_length(array_agg(c)) as c_len
from s;
But I haven't found the heap dump yet in the Cloud environment.
Dashboard URL removed, please ping me if you need it.
1000000 rows in total
* 1000 characters in each varchar
= 1GB characters in total.
It seems that in theory, as long as the size of an array is smaller than the size of total memory, it is expected not to OOM.
Suspect that it is related to memtable spilling, but let's wait for the core dump.
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.
From users' feedback,
array_agg
in batch queries is good butarray_agg
in the same stream queries can OOM.