mit-pdos / noria

Fast web applications through dynamic, partially-stateful dataflow
Apache License 2.0
4.98k stars 242 forks source link

Development roadmap? #111

Open nmeln opened 5 years ago

nmeln commented 5 years ago

I assume this is a prototype, many things still have to be implemented (and / or researched), so the system will change over time. Do you have plans to make noria production-ready? Is there a publicly accessible development roadmap or feature plan people could follow?

ms705 commented 5 years ago

You're right that the current version of Noria is a research prototype. However, it's definitely read to try out: we've manage to run some real web applications on Noria with minimal modification.

The best approximation of a development roadmap is probably the GitHub issues. Our research going forward will primarily focus on further improved distributed operation in the short term, although we're also exploring stronger consistency models and some offshoot ideas related to web application security.

For production use, Noria might need:

  1. Improvements to return more helpful errors when Noria doesn't support a query yet (#98, nom-sql, #36).
  2. Better fault-tolerance and high-availability support: client failover (#105) and rebuilding only failed shards (rather than entire operators).
  3. Better resharding/shuffles (#95), so that it can support upqueries across shuffles in the data-flow.

We're actively working on 2. and 3. as part of our scalability work, and hope to fix 1. as well.

We also plan to keep the versions released to crates.io stable, and will use semantic versioning when we make breaking changes.

Noria primarily remains a research project, but we are keen to support people who want to use it for real applications. If you have a use case that you'd like us to consider, do let us know!

nmeln commented 5 years ago

focus on further improved distributed operation in the short term, although we're also exploring stronger consistency models and some offshoot ideas related to web application security.

Sounds exciting!

Our use-case is aggregating over semi-large amounts of data (10 - 20 million rows in a table) in MySQL and getting last value from each group where timestamp is < (less) than some time (like midnight of current day). Around 1000 - 10000 rows are added per hour.

Incrementally updated materialized view that uses this aggregation query would work for us, I guess.

MySQL materialized views make this really difficult to achieve. Flexviews could be a solution, but we decided against it.

The incoming data may be out of order, and sometimes we need to take historical data into account, so it's difficult to use time windows for grouping. We also do several joins with other tables. These are some of the reasons why we didn't choose Spark, Kafka Streams or other streaming framework. Operationalization complexity / costs is another reason.

jonhoo commented 5 years ago

@ranchoiver I think that sounds like an excellent use-case for Noria! The one thing that we don't quite support yet is "rolling" time windows, which it sounds like you need. Specifically, you need a query with a filter that has a time-variant parameter. This would require the materialized view to change even if there are no writes to it, which is not something we currently support. It is definitely on our radar though, because it's also something that many other applications need!

nmeln commented 5 years ago

This would require the materialized view to change even if there are no writes to it, which is not something we currently support. It is definitely on our radar though, because it's also something that many other applications need!

Yes, exactly. Gonna follow the news

jonhoo commented 5 years ago

As an aside, noria-server probably won't be on crates.io until https://github.com/rust-lang/cargo/issues/1565 is solved (which may be a while).

mjjansen commented 5 years ago

@jonhoo a couple other features I'd be curious about:

jonhoo commented 5 years ago

@mjjansen

mjjansen commented 5 years ago

@jonhoo 1 more question... did you consider https://github.com/andygrove/sqlparser-rs vs https://github.com/ms705/nom-sql. I wonder if the effort can be combined.

jonhoo commented 5 years ago

That crate didn't exist when we first started building Noria :) Combining efforts is probably not a bad idea though! (cc @ms705)

mjjansen commented 5 years ago

got it. thank you!

3noch commented 5 years ago

:+1: x 100 for push notifications (subscribing to queries). This would make noria not just a faster database than alternatives, but perfectly ideal for many applications that currently have to get this behavior manually with lots of error-prone work.

3noch commented 5 years ago

Also having a Postgres adapter would be pretty amazing.

jonhoo commented 5 years ago

Hehe, yes, a Postgres adapter would be great, it just requires implementing the Postgres binary protocol in Rust similar to msql-srv. That's the bulk of the work. Once that's in place, the Noria SQL shim would just need to be able to run in both modes.

xNxExOx commented 5 years ago

My use case would be many simple queries over 1 or two tables to, and some complicated queries that need to run every few minutes now to keep local copy updated. These queries generate in game leaderboards, and server keeps local copy of whole leaderboard and update it every few minutes. If I understand it correctly I could get rid of that queries with noria and do selects directly and the first one would take minutes like now, but all other would be fast. But there is big problem, because of few decisions we need server (and DB) to be able to build and run locally on developers windows machines. Can you make it windows compatible please?

mitar commented 4 years ago

I would also think that sqlparser-rs would be a better fit. Especially because it is used also by DataFusion. And if Noria starts using Arrow, then we have a crazy compatibility here between Arrow, DataFusion and Noria.

Personally, I do not care at all about MySQL adapter. What I would like to see is being able to observe all changes which are happening (getting deltas) to the materialized view state and push them out, and ideally push deltas out using Arrow representation from server to client.

So +1 for push notifications (or I would say live query, I think this is the more common term). I do not think Noria has to provide any web API here, just expose things through Rust API, and then users can hook their own logic in Rust to push them to websockets or whatever.

jonhoo commented 4 years ago

So, push notifications are tricky because they imply full materialization everywhere, which comes at a steep cost. There might be a good way to register interest in keys and then subscribe to updates for those keys, but that's not something we're actively working on. Might be a neat additional feature to add eventually though — it shouldn't be too hard, as most of the infrastructure is already there.

mitar commented 4 years ago

I opened https://github.com/mit-pdos/noria/issues/143 to make a better place to discuss that feature. This issue looks too broad to me.