Open spacejam opened 5 years ago
I'll leave a link to Apache Arrow and loom
here, which may both end up being relevant to this.
See also this highly relevant Reddit thread. I left a comment outlining some of my thoughts around async Rust DB libraries.
I could "probably" help on improving data accesses and these kind of things. Like data-oriented programming...
There are definitely shared concerns for database projects.
First to mind for me: Shared ecosystem.
Many of our projects will share dependencies like RPC, monitoring, storage engines, consensus layers, clients, parsing & lexing, query planning, co-processors, MVCC models, diagnostics, logging, networking, even things like common interface patterns.
While there exist working groups for many of these topics, there are others that exist in the void. In those that do have relevant working groups the demands of databases are not always understood, or the capacity to address them is too small.
Having some way to find, shepherd, mentor for, and/or highlight issues of concerns to database projects could benefit us all.
I'm mostly interested to hear about testing tips. loom
and fail-rs
is something that is in my radar for instance. I would also love to see a convergence on best practices for error handling and logging and apply them to tantivy.
We're currently porting https://prisma.io into Rust and from our work hopefully spawns reusable components as separate crates. So I'm all in to the WG.
We really need a DSL to abstract the SQL syntax between the databases, so for that we have prisma-query. Not optimized and constantly changing, takes a bit too much of ownership for internal reasons, but hopefully could be useful for others too. The plan is to have features first, to be correct and then be fast.
@pimeys Great to hear that you are trying to port prisma to rust.
We really need a DSL to abstract the SQL syntax between the databases, so for that we have prisma-query. Not optimized and constantly changing, takes a bit too much of ownership for internal reasons, but hopefully could be useful for others too. The plan is to have features first, to be correct and then be fast.
Did you try diesel
and diesel-dynamic-schema
? Did you find any issues that prevents using it?
By the way to you probably want to have a look at wundergraph
that does already provide a simple way to build a graphql schema from a given database. It builds on top of diesel. It is in a working state but missing some small improvements and the documentation needs to be written/updated.
@weiznich We didn't try diesel-dynamic-schema
and I guess the reason was we just didn't find it back then when we evaluated our options. Yes, it would make sense to use an off-the-shelf solution and yes, I'll put a post-it today to our kanban board to evaluate it. I don't really want to build our own DSL so this will have priority.
The other part of the team is currently parsing the incoming graphql and they will need a schema parser too. We found the wundergraph
yesterday, so we'll be evaluating it when the time comes.
@pimeys
The other part of the team is currently parsing the incoming graphql and they will need a schema parser too.
You may want to look at juniper for that :wink:
It was evaluated, but turned out to be too complex for us, so we chose a graphql parser, handling the query execution by ourselves.
Hi all, my name's Samuel, I'm loving Rust and bringing together a bunch of engineers on open-source development (LLVM, register-based VM, custom programming languages, custom consensus algorithms, &etc.).
Here is my 2¢!
In addition to @Hoverbear's great comment above https://github.com/rust-db/coordination/issues/1#issuecomment-475812391, some obvious things to share [off the top of my head] are:
Also documentation for all the above, in an easy to grok manner, e.g.:
Additionally, we could look at an AreWeDByet, similar to arewebyet & areweasyncyet, but for database development in Rust.
Wouldn't hurt to have monthly working group meetings over videoconference, though we need to be careful it doesn't turn into a research/reading group.
Finally, some random resources on database engine creation, HA, clustering, tradeoffs between levels of consistency &etc. wouldn't go amiss. Including textbooks, lecture series and related. Found some of this directly through @spacejam. Maybe link this AreWeDByet to a wiki—or just a github repo—then we can all start contributing? - And maybe an IRC channel on moznet?
Thanks for your consideration 😃
It looks like we've got two ideas of who a database working group is targeted at; database integration (like making the story of connecting to databases from various frameworks nicer), and database implementation (which is where I think this issue is focused).
I'm keen to support a Rust database implementation community that can share experience and resources.
knowledge sharing sessions where we learn about each other's approaches to testing, debugging, and optimization.
I think this is a great idea, and personally feel like it's at least a prerequisite to identifying and sharing common implementations (in storage engines particularly I think there's an eventual need to own as much of your stack as possible). Resources like Ayende's recent dive into sled
could help surface some design decisions in a storage codebase from an outside perspective. Having a similar dive into tantivy
would be great too.
As an example, I'm sure there are probably some common approaches that we all find we need in some form. Some things that immediately come to mind from our own storage engine:
trait MemRead {
// if `MemRead::bytes` returns `Some` they *must*
// be exactly the same bytes yielded by `MemRead::into_reader().read_to_end()`
type Reader: Read + Into<Self>;
fn bytes(&self) -> Option<&[u8]>;
fn into_reader(self) -> Self::Reader;
}
Which is basically std::read::Read
, but lets you optimize for the case where you're holding a contiguous slice. An early decision we made was never to assume we'll always have a contiguous slice to work with, which made dealing with compression or data stored across multiple pages much more natural later.
Designing with Windows in mind right from the start can save some pain down the track.
Our storage engine isn't lock-free like sled
; it uses a strategy that prevents writes and maintenance from ever blocking reads so we've spent a lot of time localizing, building infrastructure to verify, and documenting locks and how they need to be held to perform certain operations.
We interact with our engine exclusively through C#, so FFI is really important to us. I've started pulling out the guts of that FFI work into a public example repo which I'm working on turning into something more compelling. A nice C ABI would probably also be useful for tantivy
and other Rust databases down the track too.
So maybe a good starting point could be to organize some knowledge sharing sessions like @spacejam suggested of our current codebases with the output being posts or conference talk content and an idea of how we could support our community better?
@KodrAus Yea, I brought this up in https://internals.rust-lang.org/t/kickstarting-a-database-wg/9696 as well.
I like the idea raised by bitshiftmask in that thread, having one large Database working group, with sub-teams for various aspects.
Anyway, I also just wanted to post it here that we have a zulip stream now: https://rust-lang.zulipchat.com/#narrow/stream/193127-wg-database
And I would love to send a doodle around at some point to find some time that we can maybe discuss what we want to do (in high-level terms i.e. "I want to find X people to talk about Y") in sync some time
Just to make sure to ping people here too, I'd love to get started with a synched "kickoff meeting" soon, somewhere where we can discuss who wants/ can work on what and maybe form some sub-teams.
Generally I see a lot of energy in this space but I feel like deliberate collaboration is important.
The zulip topic is here: https://rust-lang.zulipchat.com/#narrow/stream/193127-wg-database/topic/Coordination.20meetings
As an answer to the original question about a DB working group:
Database abstraction is the game in every popular framework in the web world. People who install Symfony, for example, don't want to care much if they run Postgres or MySQL or MariaDB or whatever else. They think the DB abstraction layer or the ORM etc. should handle that for them.
So yes, a meta-group would be useful.
I was looking for something like JDBC (generic, DB-agnostic, database interaction API) in Rust, and it seems that this issue is the closest we have? Or am I wrong?
EDIT: there's also this: https://github.com/rust-dbc/rdbc 😬
I think the closest is probably https://docs.rs/odbc/latest/odbc/
Hey folks, @spacekookie pitched the idea yesterday of having something like a Rust database WG. The thing about most of the Rust database people I know is that they seem quite busy with their projects, probably more than many other types of projects, as databases are such a labor intensive endeavor. We scout out a possible architecture and then often iterate for years to get something we are comfortable with, if we get there at all.
My question is: to what extent can we share foundational libraries, testing techniques, production expertise etc... so that we can reach our performance, reliability, and user ergonomic goals faster than we would be able to achieve on our own? We seem to love writing everything ourselves, often with justifications of "that other stuff isn't fast enough" ;) And perhaps some of us have started working on databases specifically to avoid having to work with many other people, as these things tend to be low-coordination by necessity unless there is some serious capital being invested in reducing bus factor by sharing knowledge that is often high effort to share at all.
But maybe it makes sense to turn some of the components in our various systems into more accessible shared libraries for each other. Personally, I would love to have some knowledge sharing sessions where we learn about each other's approaches to testing, debugging, and optimization.
As a set of examples, I'd like to talk with more Rust DB folks about techniques for finding bugs in concurrent Rust, fuzzing approaches that actually yield bugs, possibly combining our various bespoke histogram collection libraries, and generally how we can turn our many shell and perl scripts that we use for causing bugs and performance issues to pop out into more ergonomic tooling that is accessible to more Rust users.
Databases in Rust are definitely a thing now. We often pour so much of our lives into them. Let's share the table stakes stuff so we can spend more time being creative.
Please add a reaction or text response if any of this appeals to you! Thanks for reading :)
Tagging a few folks from academia and industry, involved in building databases or adjacent tech in Rust: @utaal @frankmcsherry @fulmicoton @jonhoo @kodraus @kerollmops @cswinter @tglman @krl @hmwill @alex-shapiro @mbalex99 @davidrusu @siddontang @ngaut @busyjay @hoverbear @c4pt0r
Please tag other folks who may be interested in at least being a fly on the wall :)