mit-pdos / noria

Fast web applications through dynamic, partially-stateful dataflow
Apache License 2.0
4.98k stars 242 forks source link

Sit infront of existing MySQL server #116

Open alexbowers opened 5 years ago

alexbowers commented 5 years ago

Hi, would something like this be able to act as a MySQL passthru where noria acts as the cache layer infront of MySQL but the storage and data processing is all done by MySQL itself (or MariaDB).

Or does the way that the dataflow work invalidate this as a concept?

jonhoo commented 5 years ago

This was raised in a Reddit comment where I also gave a rough answer. I think it's relatively hard to make Noria just work "in front of" a SQL database, since Noria fundamentally relies on being able to process writes as they arrive. I suppose you could just use MySQL as a "dumb storage" behdind Noria, like RocksDB is today, but I'm not entirely sure what that would buy you? It'd also slow down the system pretty significantly for writes.

alexbowers commented 5 years ago

Yeah i imagined that would be the answer.

Do you have any details on what the limitations of Noria are in terms of MySQL compatibility?

The docs currently mention "MariaDB switch to Noria with minimal effort" but doesn't really go into any detail.

Does Noria have / aim to have 100% compatibility as a drop-in replacement for MySQL through the adapter, or is the goal really that some simple apps could use it to try Noria out, but it would use a "noria flavour" of SQL?

jonhoo commented 5 years ago

I don't know that the goal is 100% compatibility, in part because this is a research prototype, and many MySQL features simply aren't particularly interesting from a research perspective. Operators like SOUNDEX we'll probably skip. The goal is to support most of the common SQL that web applications use though, and we'll likely add features when many users observe that some particulat feature is missing.

jonhoo commented 5 years ago

We should probably document what bigger things are currently missing though @ms705

alexbowers commented 5 years ago

Thanks for the replies.

Could something like a passthru be setup so that any queries not understood by the Noria database get passed through to a "normal" mysql backend for processing, an Noria is used for higher performance for "common" queries that it does understand.

Or would this not work due to the requirement of then storing the data in two places?

ms705 commented 5 years ago

This would work in principle, as long as the writes coming into the backend DB also reach Noria for processing.

The easiest way to achieve something similar with the current code would be to run an SQL layer (e.g., MyRocks) atop the RocksDB storage we use for Noria's base tables, and to then direct queries Noria doesn't understand to that layer in noria-mysql. That said, there would probably be some concurrency issues to work out when both Noria and a write query to RocksDB (via the SQL layer) try to modify the underlying tables.

Regarding the state of SQL support: some of the missing features are just a matter of supporting additional syntax, some require new Noria operators (e.g., above-mentioned SOUNDEX), and some require more serious features to be added to Noria (e.g., range indexes for inequalities on parameters, such as column > ?).

flyaruu commented 5 years ago

A possible set up for something like this could be using a regular MySQL database along with some kind of change capture tool like Debezium.

It would add a lot of infrastructure (For Debezium, you'll need a Kafka cluster, so by extension also a zookeeper cluster), so in terms of infrastructure efficiency it does not make a lot of sense, but building a simple pipe line that funnels your changes 1-1 into a Noria database would give you an interesting way to run a Noria cluster using actual production data without much risk (as you can rebuild the Noria cluster at any time from the source database or Kafka) Also that source database does not need to be a MySQL, it can be anything that Debezium supports. If you have an existing system based on a relational database and you need to add-on a ridiculous read capability, this might be a way. Did anyone look into a set up like this?

alexsnaps commented 5 years ago

Considering trying to set Noria up as a replication slave to the main MySQL instance we have as an experiment, using binlog replication. I think this would cover what you are looking for (tho MySQL specific). Probably will be putting some time aside end of the month to try this out and see where it gets us.

paulocoghi commented 5 years ago

For me, it looks more logical to focus on (essential) MySQL compatibility, thus the contributors and maintainers don't need to spend time and energy on this, and users don't need to worry about using MySQL as a back-end.

sandinmyjoints commented 3 years ago

Considering trying to set Noria up as a replication slave to the main MySQL instance we have as an experiment, using binlog replication. I think this would cover what you are looking for (tho MySQL specific). Probably will be putting some time aside end of the month to try this out and see where it gets us.

@alexsnaps Did you ever try this? I'd be very interested in Noria as a read replica.

alanamarzoev commented 3 years ago

Hi @sandinmyjoints, this is a feature I'm currently working on implementing for Noria. Do you want to email me at alanamarzoev@gmail.com to tell me more about your specific use case? (I'll then make sure to support it)

sandinmyjoints commented 3 years ago

@alanamarzoev Thanks -- email sent!