pindell-matt / rust_bucket

Simple JSON key-value store implemented in Rust
Other
11 stars 6 forks source link

Make an examples directory #31

Open selfup opened 8 years ago

selfup commented 8 years ago

I have made a basic example repo that I would like to put inside of an examples directory.

Just enough to show users how to get set up on stable (build.rs, schema.rs.in, schema.rs, etc..) with a web framework as well as without a web framework. Currently I am pulling from our git repo in the Cargo.toml but that will be updated once we push this to crates.io.

This would be done while writing documentation, but just want to put it down as an issue so we know it is a goal.

selfup commented 8 years ago

Here is a screenshot of hooking up our library to an Iron app:

screenshot 2016-07-02 14 43 38

cite-reader commented 8 years ago

I regret to inform you that using rust buckets from Iron applications is where writing a database becomes significantly less fun. The reason is that demon, concurrency.

Suppose that instead of recreating the same structure every time, we instead want to do something dynamic; say, record the source IP and port of every hit. We would modify the schema to look something more like this

#[derive(Serialize, Deserialize, Debug, PartialEq)]
pub struct Hit {
    pub ip: SocketAddr
}

and our application to look something more like this

fn main() {
    create_table("hits", &Hit { ip: "[::1]:31337".to_socket_addrs().unwrap().next().unwrap() }).unwrap();
    Iron::new(|req: &mut Request| {
        Ok(Response::with((status::Ok, record_hit("hits", req.remote_addr))))
    }).http("localhost:3000").unwrap();
}

fn record_hit(table: &str, hit: SocketAddr) -> String {
    append_records(table, Hit { ip: hit }).unwrap();
    read_table(table).unwrap()
}

(This also happens to reveal a weakness in the current rust bucket API: we can't ask to create an empty table.)

Let's say that five thousand visitors bombard the website ten at a time, which we can simulate with ab -c 10 -n 5000 localhost:3000/. What we want at the end of this experiment is: no errors have occurred, the backing datastore is uncorrupted, and five thousand and one IP-port combinations (including the dummy used to generate the initial table) are sitting there ready for analysis.

When I run this experiment I get a wall of thread '<unnamed>' panicked at 'calledResult::unwrap()on anErrvalue: Serde(Syntax("trailing characters", 1, 3070))', ../src/libcore/result.rs:746 messages, the backing datastore says that next_id is "119", and jq tells me it's not valid JSON anymore.

This is catastrophic. We threw away thousands of writes and left a corrupted database behind, just because concurrency happened.

We either need to figure out a concurrency story or document in big flashing neon lights that concurrent use is spectacularly unsafe. -1 on including examples with any concurrency, even if they're read-only.

selfup commented 8 years ago

Yea we are about to add a README that explains the drawbacks.

Thank you for taking the time to write this up!

selfup commented 8 years ago

In Node there is a writeFileSync as you know. Is there any way to do this is Rust? Or synchronous file operations in general?

pindell-matt commented 8 years ago

Good stuff, also good catch on there being no option to simply create an empty table.

selfup commented 8 years ago

Yea we will work on a create_empty_table function

selfup commented 8 years ago

@cite-reader: https://github.com/carllerche/mio Could solve our problems. We can specifically wait until a file operation has finished

selfup commented 8 years ago

Also for using Iron and Rust Bucket, all update/create/delete should be for individual users which would mimic our test hitting multiple files at the same time.

This will have to be in the Documentation until we solve the async part

pindell-matt commented 8 years ago

Here are the official docs for mio: https://wycats.gitbooks.io/mio-book/content/index.html Looks promising, will experiment with it on a new branch.

But, mio currently doesn't support Windows - so that's a trade-off.

selfup commented 8 years ago

The contrib doc could be updated to tell Windows users to use a VM for all development.

Linux/Unix rules the web anyways, so I do not expect people to run this on Windows 2000 ME lol

cite-reader commented 8 years ago

I think your mental model is not correct.

There are no asynchronous operations in my program. The problem is that Iron spins up eight OS threads per CPU core, and they proceed without coordination.

Everything I can imagine trying to build in mio to solve that would end up being equivalent to an exclusive lock on the backing file, except more complicated, and therefore more bug prone. So... let's just lock the file when we need to.

(But where do we need to? There's the rub.)

selfup commented 8 years ago

Yea I have no idea how we lock the file per operation.

Also that does clarify your example more

cite-reader commented 8 years ago

Hm. I could have sworn there was an interface to flock(2) in std, but apparently there's not. That's awkward. I know Windows has equivalent APIs.

I'm inclined to defer work on concurrency safety until after we've defined use cases, when it'll be easier to figure out what the natural transaction boundaries are.

selfup commented 8 years ago

Yea I agree. For now let's keep what we have going

selfup commented 8 years ago

Here's a synchronous example:

screenshot 2016-07-02 23 14 21

selfup commented 8 years ago

Not the greatest example out there, but it uses some of our higher level/basic functions, and has good stdout for users to see.

selfup commented 8 years ago

@cite-reader @pindell-matt

Looks like we were looking at an old fork.

The actual mio lib seems to be cross platform and more stable than anticpated.

It also has server capabilities!

https://github.com/carllerche/mio

cite-reader commented 8 years ago

It's not remotely obvious to me how mio would help us solve the problems we have. I guess it could be part of a CSP implementation (in the Erlang tradition, or Akka on the JVM), but that rather feels like an entire fairly large crate on its own.

fs2 looks rather more like what we need. Of course the greater problem is deciding when we should be taking the locks; with our current design, we can't implement any interesting consistency models.

selfup commented 8 years ago

fs2 does seem really nice.

I guess we would lock every operation to ensure safety.

Unless that is too aggressive?

selfup commented 8 years ago
fn lock_exclusive(&self) -> Result<()>

// and

fn unlock(&self) -> Result<()>

Those two seem like what we could really use in each function. Lock on function call, and unlock prior to return

cite-reader commented 8 years ago

That's sufficient for preventing corruption, assuming no crashes or power failures etc. Actually we can even weaken it; read-only operations can take a shared lock, since it's safe to allow reads to proceed concurrently. The problem is you can still drop ack'd writes.

Let's say a client pulls out the data set with get_table, does some computation, modifies the data, and puts it back with… um, update_json. Any modifications that occured between reading the table and writing it back get clobbered.

cite-reader commented 8 years ago

I feel like "nail down the consistency model" should probably be its own issue.

selfup commented 8 years ago

Yea I'll make an new issue in a bit!