Open selfup opened 8 years ago
Here is a screenshot of hooking up our library to an Iron app:
I regret to inform you that using rust buckets from Iron applications is where writing a database becomes significantly less fun. The reason is that demon, concurrency.
Suppose that instead of recreating the same structure every time, we instead want to do something dynamic; say, record the source IP and port of every hit. We would modify the schema to look something more like this
#[derive(Serialize, Deserialize, Debug, PartialEq)]
pub struct Hit {
pub ip: SocketAddr
}
and our application to look something more like this
fn main() {
create_table("hits", &Hit { ip: "[::1]:31337".to_socket_addrs().unwrap().next().unwrap() }).unwrap();
Iron::new(|req: &mut Request| {
Ok(Response::with((status::Ok, record_hit("hits", req.remote_addr))))
}).http("localhost:3000").unwrap();
}
fn record_hit(table: &str, hit: SocketAddr) -> String {
append_records(table, Hit { ip: hit }).unwrap();
read_table(table).unwrap()
}
(This also happens to reveal a weakness in the current rust bucket API: we can't ask to create an empty table.)
Let's say that five thousand visitors bombard the website ten at a time, which we can simulate with ab -c 10 -n 5000 localhost:3000/
. What we want at the end of this experiment is: no errors have occurred, the backing datastore is uncorrupted, and five thousand and one IP-port combinations (including the dummy used to generate the initial table) are sitting there ready for analysis.
When I run this experiment I get a wall of thread '<unnamed>' panicked at 'called
Result::unwrap()on an
Errvalue: Serde(Syntax("trailing characters", 1, 3070))', ../src/libcore/result.rs:746
messages, the backing datastore says that next_id
is "119"
, and jq
tells me it's not valid JSON anymore.
This is catastrophic. We threw away thousands of writes and left a corrupted database behind, just because concurrency happened.
We either need to figure out a concurrency story or document in big flashing neon lights that concurrent use is spectacularly unsafe. -1 on including examples with any concurrency, even if they're read-only.
Yea we are about to add a README that explains the drawbacks.
Thank you for taking the time to write this up!
In Node there is a writeFileSync
as you know. Is there any way to do this is Rust? Or synchronous file operations in general?
Good stuff, also good catch on there being no option to simply create an empty table.
Yea we will work on a create_empty_table
function
@cite-reader: https://github.com/carllerche/mio Could solve our problems. We can specifically wait until a file operation has finished
Also for using Iron and Rust Bucket, all update/create/delete should be for individual users which would mimic our test hitting multiple files at the same time.
This will have to be in the Documentation until we solve the async part
Here are the official docs for mio: https://wycats.gitbooks.io/mio-book/content/index.html Looks promising, will experiment with it on a new branch.
But, mio currently doesn't support Windows - so that's a trade-off.
The contrib doc could be updated to tell Windows users to use a VM for all development.
Linux/Unix rules the web anyways, so I do not expect people to run this on Windows 2000 ME lol
I think your mental model is not correct.
There are no asynchronous operations in my program. The problem is that Iron spins up eight OS threads per CPU core, and they proceed without coordination.
Everything I can imagine trying to build in mio to solve that would end up being equivalent to an exclusive lock on the backing file, except more complicated, and therefore more bug prone. So... let's just lock the file when we need to.
(But where do we need to? There's the rub.)
Yea I have no idea how we lock the file per operation.
Also that does clarify your example more
Hm. I could have sworn there was an interface to flock(2)
in std
, but apparently there's not. That's awkward. I know Windows has equivalent APIs.
I'm inclined to defer work on concurrency safety until after we've defined use cases, when it'll be easier to figure out what the natural transaction boundaries are.
Yea I agree. For now let's keep what we have going
Here's a synchronous example:
Not the greatest example out there, but it uses some of our higher level/basic functions, and has good stdout for users to see.
@cite-reader @pindell-matt
Looks like we were looking at an old fork.
The actual mio lib seems to be cross platform and more stable than anticpated.
It also has server capabilities!
It's not remotely obvious to me how mio would help us solve the problems we have. I guess it could be part of a CSP implementation (in the Erlang tradition, or Akka on the JVM), but that rather feels like an entire fairly large crate on its own.
fs2
looks rather more like what we need. Of course the greater problem is deciding when we should be taking the locks; with our current design, we can't implement any interesting consistency models.
fs2 does seem really nice.
I guess we would lock every operation to ensure safety.
Unless that is too aggressive?
fn lock_exclusive(&self) -> Result<()>
// and
fn unlock(&self) -> Result<()>
Those two seem like what we could really use in each function. Lock on function call, and unlock prior to return
That's sufficient for preventing corruption, assuming no crashes or power failures etc. Actually we can even weaken it; read-only operations can take a shared lock, since it's safe to allow reads to proceed concurrently. The problem is you can still drop ack'd writes.
Let's say a client pulls out the data set with get_table
, does some computation, modifies the data, and puts it back with… um, update_json
. Any modifications that occured between reading the table and writing it back get clobbered.
I feel like "nail down the consistency model" should probably be its own issue.
Yea I'll make an new issue in a bit!
I have made a basic example repo that I would like to put inside of an
examples
directory.Just enough to show users how to get set up on stable (
build.rs
,schema.rs.in
,schema.rs
, etc..) with a web framework as well as without a web framework. Currently I am pulling from our git repo in theCargo.toml
but that will be updated once we push this to crates.io.This would be done while writing documentation, but just want to put it down as an issue so we know it is a goal.