uwescience / raco

Compilation and rule-based optimization framework for relational algebra. Raco is the language, optimization, and query translation layer for the Myria project.
Other
72 stars 19 forks source link

Datalog: language features? #225

Open dhalperi opened 10 years ago

dhalperi commented 10 years ago

We need to work around the first one for the demo. For now, I'm going to re-break the semantics of the Project operator.

dhalperi commented 10 years ago

@stechu please add any other issues you have with the Datalog language mode here?

billhowe commented 10 years ago

Design thoughts:

I think we will eventually want a page "Configure" or "Rules" where you can activate/deactivate each optimization rule.

Then, there can be a rule that rewrites Project into an Apply, which will be off by default. Shumo can turn it on.

I can appreciate that re-breaking datalog is simplest for now, but this transformation should really be a Myria-specific rule I think.

On Wed, Jun 4, 2014 at 3:32 PM, Daniel Halperin notifications@github.com wrote:

  • how does the user specify whether they want set semantics? Shumo does NOT want them for his demo.
  • how does the user specify a user/program name that is different than public:adhoc?

We need to work around the first one for the demo. For now, I'm going to re-break the semantics of the Project operator.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/225.

bmyerz commented 10 years ago

Although the implementation could be as a rule, I don't like presenting set/bag as a rule, since it affects the semantics of my query. Optimizations should always preserve equivalence (up to things we care about).

Also, sometimes I want some IDBs to be set and others bag. pagerank 1-iteration query (which now works in Grappa but I need to force semantics):

ranks(src, 1.0) :- edges(src, dst)    # set
sizes(src, count(dst)) :- edges(src, dst)   # doesn't matter because group by
contribs(dst, rank/len) :- edges(src, dst), sizes(src, len), ranks(src, rank)   # bag
newranks(vid, sum(cont)*0.85+0.15) :- contribs(vid, cont)  # doesn't matter because group by

(Of course in my example I could write the last two rules as one rule and just have set semantics for all)

billhowe commented 10 years ago

On Thu, Jun 5, 2014 at 10:10 PM, Brandon Myers notifications@github.com wrote:

Sometimes I want some IDBs to be set and others bag. pagerank 1-iteration query (which now works in Grappa but I need to force semantics):

ranks(src, 1.0) :- edges(src, dst) # set sizes(src, count(dst)) :- edges(src, dst) # doesn't matter because group by contribs(dst, rank/len) :- edges(src, dst), sizes(src, len), ranks(src, rank) # bag newranks(vid, sum(cont)) :- contribs(vid, cont) # doesn't matter because group by

Of course I could write the last two rules as one rule and just have set semantics for all

That seems right.

Datalog is what it is. Datalog with bag semantics is something else.

I'd be supportive of us defining a variant of datalog, but let's give it a name and let's also support pure datalog separately.

Also, consider whether MyriaL is a better choice for this problem.

And, if the syntax is the issue, I'm supportive of adding a datalog-like syntax to MyriaL: #199

In Codd we trust.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/225#issuecomment-45293477.

dhalperi commented 10 years ago

Adding the note here to investigate rule order when we fix project: See https://github.com/uwescience/raco/pull/201#discussion_r13953059

dhalperi commented 10 years ago

Also, since I just saw @billhowe's note:

Datalog is what it is. Datalog with bag semantics is something else.

I'd be supportive of us defining a variant of datalog, but let's give it a name and let's also support pure datalog separately.

Can we do this for real? E.g., don't even try to support arithmetic or aggregates in Datalog. If not, why not? What is "pure Datalog"?

billhowe commented 10 years ago

On Wed, Jun 18, 2014 at 9:33 PM, Daniel Halperin notifications@github.com wrote:

Also, since I just saw @billhowe https://github.com/billhowe's note:

Datalog is what it is. Datalog with bag semantics is something else.

I'd be supportive of us defining a variant of datalog, but let's give it a name and let's also support pure datalog separately.

Can we do this for real? E.g., don't even try to support arithmetic or aggregates in Datalog. If not, why not? What is "pure Datalog"?

Seems right.

  • Datalog
  • Datalog with aggregates and negation via stratification
  • A Datalog-like language with arithmetic (as well as aggregates and negation)