zboxfs / zbox

Zero-details, privacy-focused in-app file system.
https://zbox.io/fs/
Apache License 2.0
1.53k stars 74 forks source link

Concurrency #44

Open theduke opened 4 years ago

theduke commented 4 years ago

It seems like zbox does not currently allow for any multi-threaded concurrency at all.

Sadly this limits usage to very simple use cases.

Are there plans for introducing concurrency?

A "simple" first step could be to allow concurrent reads, with a global write lock. Although I'd really love to have concurrent reads and writes as well.

burmecia commented 4 years ago

@theduke , the File struct should be able support read-write lock concurrently, could you share some example code?

theduke commented 4 years ago

Ah, I realized that I can open multiple file handles and then use them concurrently.

Since zbox does support internal concurrency/locking, it would be quite nice to hoist all of this into the Repo struct and have all methods on Repo take &self rather than &mut self.

burmecia commented 4 years ago

In my opinion, &mut self has semantic meaning, that is, if you're going to change it you need to borrow it as mutable. Of course we can make it all &self but by doing that we lost the borrow semantics. For concurrent, you can always wrap a repo in Arc or other synchronous types, and that might suit your need.

vi commented 4 years ago

In my opinion, &mut self has semantic meaning, that is, if you're going to change it you need to borrow it as mutable.

mut should not be interpreted as "mutable", but as "exclusive".

vi commented 4 years ago

It seems to fail to support working on multiple files even without threads.

I get zbox::Error::InTrans if I try to work on multiple files simultaneously.

zbox::File behaves as if it wants exclusive access to Repo.

burmecia commented 4 years ago

It seems to fail to support working on multiple files even without threads.

I get zbox::Error::InTrans if I try to work on multiple files simultaneously.

zbox::File behaves as if it wants exclusive access to Repo.

Can you share your code about this issue? One File cannot be mutated simultaneously, but multiple files should able to do that.

vi commented 4 years ago

Can you share your code about this issue? One File cannot be mutated simultaneously, but multiple files should able to do that.

fn main() -> Result<(), Box<dyn std::error::Error>> {
    env_logger::init();
    zbox::init_env();
    use std::io::Write;

    let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "qwe123")?;
    eprintln!("Opening file1");
    let mut f1 = r.create_file("/test1")?;
    eprintln!("Opening file2");
    let mut f2 = r.create_file("/test2")?;
    eprintln!("Writing file1");
    f1.write_all(b"12345")?;
    eprintln!("Writing file2");
    f2.write_all(b"qwerty")?;
    eprintln!("Committing file1");
    f1.finish()?;
    eprintln!("Committing file2");
    f2.finish()?;

    Ok(())
}
     Running `target/debug/zboxtest`
Opening file1
Opening file2
Writing file1
Writing file2
Error: Custom { kind: Other, error: "Already in transaction" }
burmecia commented 4 years ago

Thanks @vi for your code. The reason for this because content manager needs to be exclusively accessed. In writing file1, its transaction will exclusively include the content manager, thus when writing file2 the transaction cannot get access to it any more.

vi commented 4 years ago

One File cannot be mutated simultaneously, but multiple files should able to do that. content manager needs to be exclusively accessed

Is it a bug or by design?

burmecia commented 4 years ago

One File cannot be mutated simultaneously, but multiple files should able to do that. content manager needs to be exclusively accessed

Is it a bug or by design?

Well, it is by design currently but I think a tx should not hold a shared object too long until finish is called. Some optimization could be done to deal with this situation.

vi commented 4 years ago

Can two zbox::Files be sent to two threads that will write and finish multiple transactions on their files independently, without coordination?

If no then it should be documented.

burmecia commented 4 years ago

Ok, after digging through this issue deeper I found it is actually not the exclusive access to content manager caused this issue, sorry about my misleading explanation before.

The real reason is that each thread can only run one transaction at a time. In the code below,

    let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "qwe123")?;
    let mut f1 = r.create_file("/test1")?;
    let mut f2 = r.create_file("/test2")?;
    f1.write_all(b"12345")?;    // this will create one ongoing tx
    f2.write_all(b"qwerty")?;  // this will create another ongoing tx
    f1.finish()?;
    f2.finish()?;

There are 2 files are using multi-part writing, 2 write_all() created 2 parallel txs in one thread, thus it reports an error. There are 2 solutions:

  1. run finish() sequentially
    f1.write_all(b"12345")?;
    f1.finish()?;
    f2.write_all(b"qwerty")?;
    f2.finish()?;
  1. using 2 threads to write 2 files separately
    let child1 = std::thread::spawn(move || {
        f1.write_all(b"12345").unwrap();
        f1.finish().unwrap();
    });
    let child2 = std::thread::spawn(move || {
        f2.write_all(b"qwerty").unwrap();
        f2.finish().unwrap();
    });

    child1.join().unwrap();
    child2.join().unwrap();
vi commented 4 years ago

Is ZboxFS supposed to be used in async/Tokio context? (where thread may be unpredictable).

Does it rely on thread-local storage?

Can transactions be untied from threads? There can be explicit object zbox::Transaction that can open files, which would be bound to that transaction instead of to thread-default transaction.

Or is there some definite way to check if it is OK to write a file? (when running from thread pool where tasks can migrate between threads on its own) Is only way to reliably use ZboxFS without fully controlling threading in application is to consult a map from std::thread::ThreadId before every write?

burmecia commented 4 years ago

Is ZboxFS supposed to be used in async/Tokio context? (where thread may be unpredictable).

It hasn't used async yet because that is a big change and may change the internal structure as well.

Does it rely on thread-local storage?

Yes, it stores tx id in thread local storage.

Can transactions be untied from threads? There can be explicit object zbox::Transaction that can open files, which would be bound to that transaction instead of to thread-default transaction.

Or is there some definite way to check if it is OK to write a file? (when running from thread pool where tasks can migrate between threads on its own) Is only way to reliably use ZboxFS without fully controlling threading in application is to consult a map from std::thread::ThreadId before every write?

Currently, threading is tightly coupled with tx control internally, so it is hard to expose it as API and get it right. Also, this will add complexity for developing using ZboxFS.

As for the thread pool use case, I think app has to stick each file to different thread at this moment. If app cannot control that, it might be a limitation of ZboxFS.