think about ways to let the user bias towards memory-efficient impls/algos

dhalperi commented 9 years ago

In general, we should have slow techniques that we believe will lead to much less memory usage.

One simple idea would be to replace every in-memory-java stateful operator (GroupBy, Join) with a Store followed by a QueryScan. This way all stateful operations [1,2] happen inside the backend DBMS.

[1] At a cursory level, it seems hard to push StatefulApply and UserDefinedAggregate into the database -- we would need a way to compile them into window functions and the dbms-specific UDA language. However @billhowe has ideas, at least about UDAs for Postgres. [2] The other potential challenge might be pushing Myria semantics into the database expression language. E.g. floating vs integer division, etc. I believe we should be able to do this via appropriate insertion of cast operators.

dhalperi commented 9 years ago

Another good idea is let the user supply hints about physical operations. E.g., maybe they choose a SummetricHashJoin vs a MergeJoin vs a RightHashJoin.

billhowe commented 9 years ago

+1

Seems like an easy and relatively elegant way of getting out of core processing for (almost) any plan.

And, as we move forward with making use of materialized results, all these extra stored copies of data will not go to waste. If I keep running small variants of the same program over and over, the partitionings we need will tend to already be created.

On Fri, Aug 22, 2014 at 1:07 PM, Daniel Halperin notifications@github.com wrote:

In general, we should have slow techniques that we believe will lead to much less memory usage. The basic idea would be to replace every in-memory-java stateful operator (GroupBy, Join) with a Store followed by a QueryScan. This way all stateful operations [1,2] happen inside the backend DBMS.

[1] At a cursory level, it seems hard to push StatefulApply and UserDefinedAggregate into the database -- we would need a way to compile them into window functions and the dbms-specific UDA language. However @billhowe https://github.com/billhowe has ideas, at least about UDAs for Postgres. [2] The other potential challenge might be pushing Myria semantics into the database expression language. E.g. floating vs integer division, etc. I believe we should be able to do this via appropriate insertion of cast operators.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/myria-web/issues/194.

uwescience / myria-web

think about ways to let the user bias towards memory-efficient impls/algos #194