Open billhowe opened 10 years ago
@billhowe @mbalazin @jakevdp @ryanmaas @dhalperi I need some help to further refine these ideas into a more concrete spec. In particular, integrating this federated data model into the web UI is more complicated than I realized.
The current web UI (https://demo.myria.cs.washington.edu/editor), has panes for an editor, queries, and datasets. All of these are hard-wired to a single Myria instance. For the purposes of this demo, much of the UI will need to be revised and expanded:
Another option is to forego this complexity by designing a simpler web UI -- ipython notebook was mentioned as an example here. But, there are still basic problems here that we haven't hashed out. What happens when a user clicks go on a combined SQL+AQL query? Where do the results go, and how does the user see them?
Here's my thought.
I think we only need or want V0 for now.
V0:
V1:
V2:
On Mon, Jun 16, 2014 at 3:33 PM, Andrew Whitaker notifications@github.com wrote:
@billhowe https://github.com/billhowe @mbalazin https://github.com/mbalazin @jakevdp https://github.com/jakevdp @ryanmaas https://github.com/ryanmaas @dhalperi https://github.com/dhalperi I need some help to further refine these ideas into a more concrete spec. In particular, integrating this federated data model into the web UI is more complicated than I realized.
The current web UI (https://demo.myria.cs.washington.edu/editor), has panes for an editor, queries, and datasets. All of these are hard-wired to a single Myria instance. For the purposes of this demo, much of the UI will need to be revised and expanded:
- The editor pane stays mostly the same; the query language itself will be expanded to support a mix of AQL and SQL in the manner described above.
- The query pane must change to incorporate the federated data model. What happens when a "query" consists of a mix of different languages targeting different backends? Do we generate one query per language/backend or a single result? What is the "status" of such a query? Do we need a new catalogue to keep track of query information that isn't maintained by one of the backend systems?
- Likewise, the dataset pane must change to reflect the federated data model. Do we generate one result per language+query or a single result per query? How does the system even learn about AQL results? Do we allow for exporting AQL datasets as CSV, JSON, etc. Do we need a new catalogue to track data sets across the various backends?
Another option is to forego this complexity by designing a simpler web UI -- ipython notebook was mentioned as an example here. But, there are still basic problems here that we haven't hashed out. What happens when a user clicks go on a combined SQL+AQL query? Where do the results go, and how does the user see them?
— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/216#issuecomment-46246622.
Some notes from today's meeting with @7andrew7, @mbalazin, @ryanmaas, @jakevdp, and Emad. Overall, this doesn't radically diverge from Bill's vision described above.
We proposed a very simple version 0. The goal here is to run a single query that combines SQL with AQL in a very simplistic way. In particular, the SQL and AQL don't interact at all. As an example:
X = %sql select * from foo where foo.x > 30;
Y = %aql select * from bar where bar.y < 25;
store(X, andrew:myprogram:SqlOut);
store(Y, andrew:myprogram:AqlOut);
Deliverable: run this query from the web UI. Validate by hand that the results appear in the appropriate backend databases.. This is even dumber than the v0 proposed by @billhowe .
v1: Show AQL datasets and query results from the web UI. Create a python API for querying scidb's catalogue and for fetching results. Expose these results via the web UI.
Deliverable: Expose scidb results via a new tab on the web UI. Strip away all tabs that are not relevant for our demo.
v2: Create an integrated example, where the myrial/sql query interacts with an aql query. Andrew proposed a syntax like this:
X = %sql select * from foo where foo.x > 30;
Y = %aql select * from bar where bar.y < 25;
YR = AsRelation(Y, x:int, y:int, z:int);
J = %sql select X.x, YR.y where X.x=YR.y;
Store(J, andrew:myprogram:Output);
There are various implementation options here, and we didn't decide on a single right answer. There is a research task to explore the set of export options that scidb supports.
Looks good.
Is %sql the same as MyriaL? Or, can there be a %myrial? We should demonstrate the fact that we already have a language. I don't want it to look like we hacked up everything on the fly.
Let's update the original checklist with this material, though -- the checklist should reflect the current plan.
Specifically:
If this all makes sense, I can edit the checklist.
On Tue, Jun 17, 2014 at 5:51 PM, Andrew Whitaker notifications@github.com wrote:
Some notes from today's meeting with @7andrew@, @mbalazin https://github.com/mbalazin, @ryanmaas https://github.com/ryanmaas, @jakevdp https://github.com/jakevdp, and Emad. Overall, this doesn't radically diverge from Bill's vision described above.
We proposed a very simple version 0. The goal here is to run a single query that combines SQL with AQL in a very simplistic way. In particular, the SQL and AQL don't interact at all. As an example:
X = %sql select * from foo where foo.x > 30; Y = %aql select * from bar where bar.y < 25;
store(X, andrew:myprogram:SqlOut); store(Y, andrew:myprogram:AqlOut);
Deliverable: run this query from the web UI. Validate by hand that the results appear in the appropriate backend databases.. This is even dumber than the v0 proposed by @billhowe https://github.com/billhowe .
v1: Show AQL datasets and query results from the web UI. Create a python API for querying scidb's catalogue and for fetching results. Expose these results via the web UI.
Deliverable: Expose scidb results via a new tab on the web UI. Strip away all tabs that are not relevant for our demo.
v2: Create an integrated example, where the myrial/sql query interacts with an aql query. Andrew proposed a syntax like this:
X = %sql select * from foo where foo.x > 30; Y = %aql select * from bar where bar.y < 25;
YR = AsRelation(Y, x:int, y:int, z:int); J = %sql select X.x, YR.y where X.x=YR.y; Store(J, andrew:myprogram:Output);
There are various implementation options here, and we didn't decide on a single right answer. There is a research task to explore the set of export API options that scidb supports.
— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/216#issuecomment-46384267.
Yes, sql is myrial. We can support both tags/labels.
with V0.0, which does not require any interaction on the web at all. In particular, we don't want to spend time changing the parser or the web UI until we prove that we can run queries against SciDB from plain 'ol Python (via raco).
Currently, raco doesn't connect to anything -- it's a language parser/compiler. For example, we don't connect to myria directly from raco. raco is a standalone entity, and I'm not wild about incorporating back-end specific connection logic.
- We still want the "Federated" language inside raco, unless there is a better suggestion on how specifically to implement this. The MyriaLanguage should not be polluted with AQL support. (But I can imagine other ways -- just need to be concrete, make a plan, and see it through)
I'm vague on how this will actually work. I was assuming that we wanted some common algebra where relational operators and AQL operators / queries co-exist. Otherwise, how will we ever express a program that combines both back ends?
To be more precise, what is the output of compiling a program with a combination of sql/myrial and aql? I was envisioning a single top-level sequence or parallel operator. The top-level operator would have a combination of relational operators and scidb operators. The latter is just a wrapper around an AQL query since we don't currently know how to parse this language.
Input program (combination of Myrial + AQL) ==> Common algebra
I suspect you have some other mental model in mind.
across systems instead of an explicit move. Maybe replace the existing V0.4 with this. But I'd recommend doing the simplest possible thing before doing an implicit cross-system join.
Fair enough. Honestly, I'm not sure how to implement even the simplest possible query that involves coordination across backends. Where does the coordination happen? It can't happen on appengine given its timeout restrictions. So, there may need to be some sort of meta-coordinator that takes a single query and decomposes it across the federation. I'm worried about lots of hidden complexity here.
-- andrew
Here's what I had in mind: https://github.com/uwescience/raco/tree/bigdog
More clarifications below.
On Tue, Jun 17, 2014 at 10:23 PM, Andrew Whitaker notifications@github.com wrote:
Yes, sql is myrial. We can support both tags/labels.
- The new v0 is analogous to the original V0.1. But we still need to start
with V0.0, which does not require any interaction on the web at all. In particular, we don't want to spend time changing the parser or the web UI until we prove that we can run queries against SciDB from plain 'ol Python (via raco).
Currently, raco doesn't connect to anything -- it's a language parser/compiler. For example, we don't connect to myria directly from raco. raco is a standalone entity, and I'm not wild about incorporating back-end specific connection logic.
I'm not sure where you're proposing this logic reside. MyriaWeb only?
I think the Federated algebra may need to be aware of the systems it is federating.
So why an algebra? We want to reason about federated plans the same way we reason about other plans. There are operators to move data back and forth, run queries, etc.
- We still want the "Federated" language inside raco, unless there is a better suggestion on how specifically to implement this. The MyriaLanguage should not be polluted with AQL support. (But I can imagine other ways -- just need to be concrete, make a plan, and see it through)
I'm vague on how this will actually work. I was assuming that we wanted some common algebra where relational operators and AQL operators / queries co-exist. Otherwise, how will we ever express a program that combines both back ends?
https://github.com/uwescience/raco/tree/bigdog
To be more precise, what is the output of compiling a program with a combination of sql/myrial and aql?
Doesn't necessarily need to be compiled to an ascii string -- just need some object that we can execute.
I was envisioning a single top-level sequence or parallel operator. The top-level operator would have a combination of relational operators and scidb operators. The latter is just a wrapper around an AQL query since we don't currently know how to parse this language.
Exactly. But we need to map the logical plan down to a physical plan before running it. That's what the Federated algebra is for -- a target for this mapping.
Input program (combination of Myrial + AQL) ==> Common algebra
Eventually, but this is not on the immediate critical path.
I suspect you have some other mental model in mind.
I think we mostly agree.
- The new v2 is harder than the existing versions, as it involves a join
across systems instead of an explicit move. Maybe replace the existing V0.4 with this. But I'd recommend doing the simplest possible thing before doing an implicit cross-system join.
Fair enough. Honestly, I'm not sure how to implement even the simplest possible query that involves coordination across backends. Where does the coordination happen? It can't happen on appengine given its timeout restrictions. So, there may need to be some sort of meta-coordinator that takes a single query and decomposes it across the federation. I'm worried about lots of hidden complexity here.
Don't worry about appengine restrictions. Let's get it working from Python first; that's why V0.1 exists.
You can poll from the client in RESTful situations.
(However, it could be a problem is SciDB only allows synchronous communication through its http interface. @jakevdp or @esoroush ?)
-- andrew
— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/216#issuecomment-46397175.
Yes, I was proposing to embed the query dispatch logic in myria-web. This is how we currently execute queries, and it seemed natural to keep the same basic structure. Another alternative is to create a bigdog repository that comprises raco, scidb, myria-python, etc. I would prefer this to forcing raco to take on a dependency for every backend.
Even if the logic resides within raco, I don't like the idea of overloading the optimization framework to execute queries. Also, this approach will conflict with the outstanding changes of @dhalperi to the optimization framework in #248. Dan changes the interface to optimize to accept a single top-level operator (sequence or parallel).
We basically agree on the raco changes, and I can incorporate those without too much effort.
On Wed, Jun 18, 2014 at 7:53 PM, Andrew Whitaker notifications@github.com wrote:
Yes, I was proposing to embed the query dispatch logic in myria-web. This is how we currently execute queries, and it seemed natural to keep the same basic structure. Another alternative is to create a bigdog repository that comprises raco, scidb, myria-python, etc. I would prefer this to forcing raco to take on a dependency for every backend.
We generate code that only Myria/Grappa understands. I figure we've already eaten a dependency.
Even if the logic resides within raco, I don't like the idea of overloading the optimization framework to execute queries. Also, this approach will conflict with the outstanding changes of @dhalperi https://github.com/dhalperi to the optimization framework in #248 https://github.com/uwescience/raco/pull/248. Dan changes the interface to optimize to accept a single top-level operator (sequence or parallel).
I don't see a conflict; #248 seems great.
Somewhere we have to implement the cross-system Sequence operator that says "Do X in Myria, then do Y in SciDB."
I think this logic should be in a new "Federated" algebra.
I don't think it makes sense to do this in MyriaWeb, because we want to run queries sourced from somewhere other than the browser.
But in any case, let's get something working and then refactor.
We basically agree on the raco changes, and I can incorporate those without too much effort.
— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/216#issuecomment-46518610.
As a compromise, I created a new repository to house the mimic demo logic. Below is a script that parses a Myrial/AQL program and extracts the AQL fragments. I'm working with @jakevdp and @esoroush to make this actually connect to scidb.
@billhowe @jakevdp I hacked Bill's bigdog branch to actually connect to scidb; I think this completes the v0.0 test. Caveats: the "AQL" language is actually "AFL", as per our previous discussion.
GREAT!!!!
I think you also have most of 0.1 done, as well?
(And one of the undone items in 0.1 probably shouldn't even be there -- the "MoveToX" operators don't need to exist quite yet.)
I checked off the items that I think are complete on v0.1.
I think the next step, after SIGMOD demos, is to get an example of AFL and MyriaL, together, working from the web. We can use Dan's separate deployment onc we turn it back on (again, after SIGMOD)
On Sun, Jun 22, 2014 at 9:43 AM, Andrew Whitaker notifications@github.com wrote:
@billhowe https://github.com/billhowe @jakevdp https://github.com/jakevdp I hacked Bill's bigdog branch to actually connect to scidb; I think this completes the v0.0 test. Caveats: the "AQL" language is actually "AFL", as per our previous discussion.
https://github.com/uwescience/raco/compare/bigdog
— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/216#issuecomment-46785910.
Cool stuff!
Preliminary:
V0.0: Simple pass-through test
V0.1 A minimal support for AQL in MyriaL.
%"
The MyriaL program will look something like this (is this IPython? Can we use IPython? Should we demo this in IPython):
V0.2: Python-level communication
The physical plan will look something like this:
RunMyria(plan, RunAQL("SELECT..."))
V0.3: Move large data around by allowing MyriaX to communicate directly with SciDB
V0.4: Compile a pure MyriaL program to Federated