Today we have some hardcoded hack that sets the memory allocation per operator
(like joins, sorts, grouped-aggs, etc.) to a constant irrespective of the query
complexity - at least that's my understanding.
We should move to a model where the query as a whole stays within some overall
budget - e.g., if there are N (concurrent) operators in the query, each one
should get Budget / N as its sub-allocation of memory. This would still be
braindead, basically, but it would not blow memory like we do now on really big
queries.
Open issue: How much should any given query get? I would suggest we add an
MPL parameter to the NC's and a given query should get TotalMem / MPL as its
"fair share". This is also braindead - not the final answer - but is a step in
the right direction and probably acceptable for a first release. MPL can be a
configurable parameter, and NC's should figure out (or know) what the total
memory is. They should also leave some aside for buffers and LSM in-memory
components before they view the remainder as being available for queres.
OWNER: This should be owned by Vinayak since doing the non-braindead thing
will be an ASTERIX/Algebricks/Hyracks QP responsibility/issue of the kind he'll
address in the final component of his PhD work in 2020 or thereabouts when we
think we might let him graduate. :-)
Original issue reported on code.google.com by dtab...@gmail.com on 6 Mar 2012 at 7:51
Original issue reported on code.google.com by
dtab...@gmail.com
on 6 Mar 2012 at 7:51