prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.8k stars 5.3k forks source link

Make Partial aggregation memory tracking safer during flushes #5670

Open cberner opened 8 years ago

cberner commented 8 years ago

When HashAggregationOperator.getOutput() is called in a partial aggregation, the aggregationBuilder builder is reset. However, aggregationBuilder is responsible for the memory tracking, and so the memory retained by outputIterator is only tracked implicitly by the fact that aggregationBuilder has not been re-initialized. This should be fixed, by including the memory used by outputIterator in the tracking done by operatorContext

pnowojski commented 8 years ago

Correct me if I'm wrong but I don't see this bug. First of all, memory is being tracked/updated only in aggrgationBuilder.isFull() method. Relevant code from HashAggregationOperator:


    @Override
    public Page getOutput()
    {
        if (outputIterator == null || !outputIterator.hasNext()) {
            // current output iterator is done
            outputIterator = null;

            // no data
            if (aggregationBuilder == null) {
                return null;
            }

            // only flush if we are finishing or the aggregation builder is full
            if (!finishing && !aggregationBuilder.isFull()) {
                return null;
            }

            outputIterator = aggregationBuilder.build();
            aggregationBuilder = null;

            if (!outputIterator.hasNext()) {
                // current output iterator is done
                outputIterator = null;
                return null;
            }
        }

        return outputIterator.next();
    }

Once isFull is called, memory reservation is set to correct value. Then outputIterator is assigned and aggrgationBuilder is being reseted but memory reservation is still correctly set and it won't be updated/changed until next call to aggregationBuilder.isFull() which can happen only AFTER resetting outputIterator.

cberner commented 8 years ago

Looking at the code more closely, you're correct. However, it looks like that's mostly by chance. needsInput() uses && outputIterator == null, but that doesn't actually look required. In general, operators can both receive input and produce output at the same time

pnowojski commented 8 years ago

In general yes, but I think that in this particular case it was author's intention to accept new pages only once all result pages have been returned. Otherwise we effectively double memory consumption or we complicate implementation.

After thinking a little bit more about it, it doesn't make sense to allow receiving input and producing output here. Such situation can only happen in partial aggregations and in this mode, HashAggregationOperator produces output only and only if it has ran out of memory -> it has to drop it content by building partial output. In other words when this happens there is basically no free memory to accept and process and more input pages.

wenleix commented 7 years ago

Hi @cberner , I also read the code, and if I didn't understand wrong, the basic flow for partial aggregation is:

There will be an issue if we allow more inputs while flushing the results. However, it looks to me the current implementation of AggregationBuilder will not purge the entry from GroupByHash after the corresponding aggregated row is generated. Thus, I agree with @pnowojski it doesn't make sense to accept more input pages at this moment.