Open cberner opened 8 years ago
Correct me if I'm wrong but I don't see this bug. First of all, memory is being tracked/updated only in aggrgationBuilder.isFull()
method. Relevant code from HashAggregationOperator
:
@Override
public Page getOutput()
{
if (outputIterator == null || !outputIterator.hasNext()) {
// current output iterator is done
outputIterator = null;
// no data
if (aggregationBuilder == null) {
return null;
}
// only flush if we are finishing or the aggregation builder is full
if (!finishing && !aggregationBuilder.isFull()) {
return null;
}
outputIterator = aggregationBuilder.build();
aggregationBuilder = null;
if (!outputIterator.hasNext()) {
// current output iterator is done
outputIterator = null;
return null;
}
}
return outputIterator.next();
}
Once isFull
is called, memory reservation is set to correct value. Then outputIterator
is assigned and aggrgationBuilder
is being reseted but memory reservation is still correctly set and it won't be updated/changed until next call to aggregationBuilder.isFull()
which can happen only AFTER resetting outputIterator
.
Looking at the code more closely, you're correct. However, it looks like that's mostly by chance. needsInput()
uses && outputIterator == null
, but that doesn't actually look required. In general, operators can both receive input and produce output at the same time
In general yes, but I think that in this particular case it was author's intention to accept new pages only once all result pages have been returned. Otherwise we effectively double memory consumption or we complicate implementation.
After thinking a little bit more about it, it doesn't make sense to allow receiving input and producing output here. Such situation can only happen in partial aggregations and in this mode, HashAggregationOperator
produces output only and only if it has ran out of memory -> it has to drop it content by building partial output. In other words when this happens there is basically no free memory to accept and process and more input pages.
Hi @cberner , I also read the code, and if I didn't understand wrong, the basic flow for partial aggregation is:
Keep adding pages until AggregationBuilder
is full (by asserting it needs more input: code ) . The memory is updated when the new page is added: code
It only starts to build the results when the AggregationBuilder
is full. And it won't require any more inputs before all results are consumed. Thus the code is correct in the current implementation.
There will be an issue if we allow more inputs while flushing the results. However, it looks to me the current implementation of AggregationBuilder
will not purge the entry from GroupByHash
after the corresponding aggregated row is generated. Thus, I agree with @pnowojski it doesn't make sense to accept more input pages at this moment.
When
HashAggregationOperator.getOutput()
is called in a partial aggregation, theaggregationBuilder
builder is reset. However,aggregationBuilder
is responsible for the memory tracking, and so the memory retained byoutputIterator
is only tracked implicitly by the fact thataggregationBuilder
has not been re-initialized. This should be fixed, by including the memory used byoutputIterator
in the tracking done byoperatorContext