Open NomAnor opened 3 years ago
@NomAnor Thank you for the feature request. If the problem is that the refresh_continuous_aggregate
consumes too much memory, then a better approach would be to release memory on a regular basis inside the refresh_continuous_aggregate
. Would that solve your problem or is there another reason you want to be able to run the function inside a procedure or transaction?
I think that would work. Would this split the work into multiple transactions internally? Maybe have an optional batch_size
parameter that specifies the maximum number of chunks per transaction? That should also be available with add_continuous_aggregate_policy
.
I didn't try to run this inside a transaction (as far as I understand it, a procedure does not start one automatically and I CALL
ed it as the only command from psql). It should be possible to use is it inside a procedure. It's just not possible to use it inside a foor loop that loops over a query result because of the open cursor (that might internally start a transaction?). Maybe add an additional error note in the documentation like
I'm going to jump on this one. My use case is simply to iterate backwards backfilling caggs, and this is what I tried:
do $$
declare rec record;
begin
for rec in
select generate_series('2021-06-01', '2015-01-01', '-12 hour'::interval) as foo;
loop
raise notice '%', rec.foo;
CALL refresh_continuous_aggregate('blah', rec.foo, rec.foo+'12 hour'::interval);
end loop;
end $$;
which also results in ERROR: cannot commit while a portal is pinned
In this context, the memory usage is too high to backfill more than a day, and the whole operation is expected to take considerable time (days for this table, weeks for the next table I'm tackling) so I want to work backwards and get the more useful (recent) data to my users quicker.
My workaround is to abuse shell, screen and a passwordless login to do it:
for x in `seq 1 16000`;
do psql dbname -qt -c "select current_date-($x * '3 hours'::interval)" \
-c "CALL refresh_continuous_aggregate('blah', current_date-($x * '3 hours'::interval), (current_date-($x * '3 hours'::interval))+'3 hours'::interval)";
done
Not the prettiest, but it works, and unlike a single refresh_continuous_aggregates()
call, I can be pretty sure I'm not going to end up losing weeks worth of crunching if something goes wrong towards the end
@mkindahl we are running into ERROR: cannot commit while a portal is pinned
when running refresh_continuous_aggregate from a procedure too.
I've run into this using CREATE INDEX ... WITH (timescaledb.transaction_per_chunk);
. It doesn't seem to be a memory issue (it happened on a basically empty table).
Relevant system information:
Describe the bug Because
max_interval_per_job
is no longer supported the materialization jobs might run over very large data ranges (and concurrently) using too much memory. Especially on freshly imported backups. I tried to wite a procedure to callrefresh_continuous_aggregates
repeatedly over smaller ranges but this throws the following error:To Reproduce
The query in the for loop seems to hold a cursor open (Mention of portal in the PostgreSQL documentation) which prevents the materialization transactions to commit. Can this restriction be lifted? This would allow to write custom materialization functions that loop over some query results.
This could be related to #2876.