mit-pdos / noria

Fast web applications through dynamic, partially-stateful dataflow
Apache License 2.0
4.98k stars 242 forks source link

SQL support #133

Closed turbo closed 5 years ago

turbo commented 5 years ago

Sorry if I missed an obvious doc link somewhere, but I wasn't able to find comprehensive documentation on the extend of SQL support in either noria and or noria-mysql.

I'm looking to replace some part of a 1bn+ row (postgres-based) query system with Noria, since a lot of it is written in Rust and queries are already using mat views in most cases, so here goes my feature list:

JohnWaltonProjects commented 5 years ago

Interesting project. +1 for Postgres adapter for Noria. Good Luck.

jonhoo commented 5 years ago

Hi! This is very much still a research prototype, so I probably wouldn't recommend trying to use it in production unless you are willing to put significant effort into maintenance and development. Since it is a research system, SQL support is also very much driven by what our benchmark applications need, rather than aiming for complete support. Of what you mention:

Note also that writes in Noria aren't generally done through SQL (though they are if you run the MySQL protocol shim).

turbo commented 5 years ago

Thanks for the detailed answer,

wouldn't recommend trying to use it in production unless you are willing to put significant effort into maintenance and development

That's fine, my day job is juggling fire utilizing a lot of research-quality projects to drive innovation internally. E.g. I also maintain a similar solution (not-public) in another language, which is based on the SQLite VM and query planner, with in-memory mat views. I'd probably rip the useful parts out of Noria, but the approach struck me as useful.

UPSERTs aren't really supported, though we do have INSERT OR UPDATE

No entirely familiar with mysql, does this include noop conflict resolution? I'm using that to dedupe and normalize tables in PG in one step with DO NOTHING. Reminds me of another question: A lot of concurrent conflict resolutions on tables with unique constraints might cause deadlocks because the index tuples of two transactions can conflict in a way that causes the txs to wait for each other's completion based on the transaction visibility rules in MVCC. Is this a problem in Noria? Or on a more basic level: What does Noria's ACID and (TX/stmt) concurrency story look like? What kind of constraints are supported? Can columns be computed, too?

We do have support for "standard" JOINs, but so far only if the join condition is equality on a single field.

Nice. Multiple, single-field JOINs are supported then I take it?

Note also that writes in Noria aren't generally done through SQL

I understand, however the SQL adapter is essential when external systems are connecting to Noria for blending and analysis (Redash, DataGrip, Drill etc.)

jonhoo commented 5 years ago

No entirely familiar with mysql, does this include noop conflict resolution?

I'm not sure I follow? The feature I am thinking of is essentially this, which is "if the primary key already exists, do an update instead". Not sure what you mean by noop conflict resolution.

Or on a more basic level: What does Noria's ACID and (TX/stmt) concurrency story look like? What kind of constraints are supported? Can columns be computed, too?

Noria does not currently provide any transaction support. Reads are eventually consistent with exactly-once guarantees. Writes are durable once they enter the graph, but their effects are not necessarily visible immediately. Noria does not support constraints (yet). We are looking at schemes to add support for reading your own writes, but do not have anything finalized on that yet.

Not sure what you mean by computed columns?

Multiple, single-field JOINs are supported then I take it?

Yes indeed!

turbo commented 5 years ago

Not sure what you mean by noop conflict resolution.

It's the ON CONFLICT DO NOTHING resolution strategy in Postgres. Let's say you have a lot of names in lotta_names and a table names with a unique constraint on it's column name, then this would be the quickest way of inserting names not already known:

INSERT INTO names(name)
SELECT ln.name
FROM lotta_names ln
ON CONFLICT DO NOTHING

I guess since there's no constraint support, this doesn't matter, as Noria can't enforce e.g. uniqueness, right? So it would be on the "client" to enforce that prior to insertion.

Not sure what you mean by computed columns?

Computed columns are virtual colums in a table which represent the result of an expression, here's some pseudocode:

create table foo {
  a text
  b text
  c text computed(a + ' ' + b)
}

computed columns are computed on access by default, but can also be stored (computed on row mutation).

turbo commented 5 years ago

I think most of my questions were answered, though. So I'll close this, but discussion might continue :+1:

ms705 commented 5 years ago

Regarding the conflict resolution on insert: Noria does enforce uniqueness of primary keys, and it's default (and only) behavior is actually what you describe (ignore duplicates).

See the code here.

(Noria does not support insert queries that directly select from other tables, like in your example. The data would have to go through the client first.)