jonhoo / left-right

A lock-free, read-optimized, concurrency primitive.
Apache License 2.0
1.95k stars 94 forks source link

Move direct_write into left_right, plus a few convenience changes. #84

Closed code-elf closed 3 years ago

code-elf commented 3 years ago

I've been experimenting with using left_right in my production application, and there were a few changes I'd like to see to make it more convenient in usage. Generally though, the recent changes are awesome and should eliminate the need for me to fork entirely, so thank you. πŸ’œ

This pull request:

This is a draft pull request because:

One more thing that's more of a question, but it might be something that can be fixed in Aliased: Is there any way to get as_deref to work with an Option<&Aliased<T, D>>? Right now, it seems to just...do nothing, and I'm not sure why.


This change is Reviewable

jonhoo commented 3 years ago

I've been experimenting with using left_right in my production application, and there were a few changes I'd like to see to make it more convenient in usage. Generally though, the recent changes are awesome and should eliminate the need for me to fork entirely, so thank you. :purple_heart:

:tada: :heart:

This pull request:

  • Moves the direct_write functionality into left_right. For this purpose, it adds a clone_first method to Absorb. I figure pretty much every implementation of left_right could benefit from this optimization.

Yeah, I think you're probably right. I'm still iffy on whether I like clone_first as a method, but if it's documented really well it might be okay.

  • Makes raw_handle take &self rather than &mut self. It doesn't need it, and it was getting in the way (more on that later).

This seems fine, though we need to document that it is now never safe to mutate through the handle (when it took &mut self that might be safe).

  • Re-adds the flush method that only publishes if there are actually oplog entries. Cause convenience.

If you find flush handy, then I'm fine bringing it back!

  • Makes left_right::new not require Clone. Duplicates a little bit of code, but again, convenience.

This one is a little weird, because it means that the two initial states are separate instances, which is something that may cause divergence down the line. For example, for HashMap, you would need to change the hasher in the write handle to be the same as the one in the read handle, because Default creates a new hasher with a new random seed. In other words, the current implementation of clone_first in evmap in your PR is unsound β€” the two maps will give different iteration orders, which will cause aliasing mismatches if the user does a retain for example that only matches the first item iterated over. I think you'd have to use something like Clone::clone_from, though that won't work either since you can't clone the data (and I'm super hesitant to implement Clone for Aliased by having it alias). I think the best solution is for clone_first to completely replace self with a HashMap::with_capacity_and_hasher that uses the capacity and hasher from other, though I don't know if all wrapped data structures will have such an easy time of it.

This is a draft pull request because:

  • All the unit tests run through fine, but I haven't actually tested it in production yet.

That would indeed be a useful data point :sweat_smile:

  • I'd like your feedback on which of these you actually want to incorporate.

See above. I'm unfortunately fairly hesitant about clone_first because it is so tricky to get right (as evidenced by the PR impl being wrong). With good enough docs I might be okay with it though.

  • I haven't fully updated documentation (will do once the previous point is settled.)

:+1:

  • I'm not sure whether the change to raw_handle is actually correct - the doc says "it is only safe to read through this pointer if you know that the writer will not start writing into it," and I'm understanding that as "there may not be a publish while the pointer is being held," but if even calling add_op may cause issues, then this change is probably incorrect.

Calling add_op concurrently should be fine, because it will not ever modify the map that the readers have access to.

  • I'm also not sure how correct this is. In fact, I'm not quite sure what the purpose of the ready flag is - I figure it's to prevent read access to a map that hasn't seen a publish yet, but I'm also not sure what the problem with that would be. There's probably something I'm missing here.

The ready flag is a bit of an evmap-ism, and isn't really about safety. It is there so that reads that happen before a map has been "populated" (i.e., before the first publish) can detect that that's what happened. This has tie-ins to Noria, which might create a map and then populate it, and readers that happen to see the initial empty map should treat that as a signal of "come back later, the thing is still being computed".

One more thing that's more of a question, but it might be something that can be fixed in Aliased: Is there any way to get as_deref to work with an Option<&Aliased<T, D>>? Right now, it seems to just...do nothing, and I'm not sure why.

Hmm, that's interesting. I think that should work since

let x: Option<&Box<usize>> = None;
let z: &usize = x.as_deref().unwrap();

compiles just fine, as does

let x: Option<&std::cell::Ref<usize>> = None;
let z: &usize = x.as_deref().unwrap();

What errors do you get if you write the same code as above if you use Aliased instead (I'm away from my regular computer, otherwise I'd check myself)?

code-elf commented 3 years ago

This one is a little weird, because it means that the two initial states are separate instances, which is something that may cause divergence down the line. For example, for HashMap, you would need to change the hasher in the write handle to be the same as the one in the read handle, because Default creates a new hasher with a new random seed.

I thought such special cases were precisely what the new_from_empty method was for, which has been unchanged.

the current implementation of clone_first in evmap in your PR is unsound β€” the two maps will give different iteration orders, which will cause aliasing mismatches if the user does a retain for example that only matches the first item iterated over.

Hmm. Are you sure? I didn't actually change anything about how evmap creates and sets up its maps; I really just moved the code from JustCloneRHandle in absorb_second into clone_first, and it's called at roughly the same time, so I'd think it would be fine - unless there was something unsound about that implementation, as well? Perhaps it's the name clone_first that's confusing, because it doesn't actually clone the first map, it just...uh, imports its data, I suppose. Since the operation I was replacing was called "just clone r_handle" I went with that, but maybe another name would be better.

As for as_deref...I think that's just how it behaves when working with references, in general. Your example specifically works, but it works even when removing the as_deref() - and interestingly, I can keep chaining as_deref()s with no change, so it's almost like it derefs to itself.

fn wants_usize(param: &usize) -> usize {
    *param
}

fn test() {
    let x: Option<&Box<usize>> = None;
    let z: usize = x.as_deref().copied().unwrap_or_default();
    let option: Option<&Box<usize>> = None;
    let value: usize = option.as_deref().map(wants_usize).unwrap_or_default();
}

yields

error[E0599]: no method named `copied` found for enum `std::option::Option<&std::boxed::Box<usize>>` in the current scope
   --> src\state\map.rs:226:30
    |
226 |     let z: usize = x.as_deref().copied().unwrap_or_default();
    |                                 ^^^^^^ method not found in `std::option::Option<&std::boxed::Box<usize>>`
    |
   ::: C:\Users\Maya\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\core\src\option.rs:158:1
    |
158 | pub enum Option<T> {
    | ------------------ doesn't satisfy `_: std::iter::Iterator`
    |
   ::: C:\Users\Maya\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\alloc\src\boxed.rs:160:1
    |
160 | pub struct Box<T: ?Sized>(Unique<T>);
    | ------------------------------------- doesn't satisfy `std::boxed::Box<usize>: std::marker::Copy`
    |
    = note: the method `copied` exists but the following trait bounds were not satisfied:
            `std::boxed::Box<usize>: std::marker::Copy`
            `std::option::Option<&std::boxed::Box<usize>>: std::iter::Iterator`
            which is required by `&mut std::option::Option<&std::boxed::Box<usize>>: std::iter::Iterator`

error[E0631]: type mismatch in function arguments
   --> src\state\map.rs:228:43
    |
220 | fn wants_usize(param: &usize) -> usize {
    | -------------------------------------- found signature of `for<'r> fn(&'r usize) -> _`
...
228 |     let value: usize = option.as_deref().map(wants_usize).unwrap_or_default();
    |                                              ^^^^^^^^^^^ expected signature of `fn(&std::boxed::Box<usize>) -> _`

The only way to fix that seems to be using .map(|x| &**x) in favour of as_deref(). Either way, it doesn't seem to be something specifically wrong with Aliased, so unless there's something we could and should do to make it more convenient to work with in this case, it's probably not relevant here.

Other than that, I've since checked it in production, and it all seems to be working perfectly fine!

jonhoo commented 3 years ago

This one is a little weird, because it means that the two initial states are separate instances, which is something that may cause divergence down the line. For example, for HashMap, you would need to change the hasher in the write handle to be the same as the one in the read handle, because Default creates a new hasher with a new random seed.

I thought such special cases were precisely what the new_from_empty method was for, which has been unchanged.

Ah, yes, you're right, evmap uses new_from_empty at the moment, and so avoids this particular problem. That is super subtle though, that new_from_empty clones, whereas new creates two separate Defaults. Had evmap used left_right::new in the case where no capacity or hasher was given (which it very well could have done), then it would indeed be broken. Seems like a foot-gun waiting to go off. Though I also admit that it is really nice to have a constructor that doesn't require Clone. I think we can keep what you have as long as we make it very clear in the documentation for new that it is up to the caller to ensure that the starting states from two distinct Default instances still guarantee determinism following clone_first. Arguably that constructor should be unsafe. Arguably both constructors should be...

the current implementation of clone_first in evmap in your PR is unsound β€” the two maps will give different iteration orders, which will cause aliasing mismatches if the user does a retain for example that only matches the first item iterated over.

Hmm. Are you sure? I didn't actually change anything about how evmap creates and sets up its maps; I really just moved the code from JustCloneRHandle in absorb_second into clone_first, and it's called at roughly the same time, so I'd think it would be fine - unless there was something unsound about that implementation, as well? Perhaps it's the name clone_first that's confusing, because it doesn't actually clone the first map, it just...uh, imports its data, I suppose. Since the operation I was replacing was called "just clone r_handle" I went with that, but maybe another name would be better.

Yeah, I think a different name would be good. Maybe fn absorb_all? fn sync_with? fn resync_with?

As for as_deref...I think that's just how it behaves when working with references, in general. Your example specifically works, but it works even when removing the as_deref() - and interestingly, I can keep chaining as_deref()s with no change, so it's almost like it derefs to itself.

Hmm, I suppose that probably means that & itself implements Deref, and that that's what's tripping things up :shrug: I wonder if this is something you could file with rust-lang/rust as something that should be fixed. May not be fixable now that as_deref has stabilized though, not sure.

Other than that, I've since checked it in production, and it all seems to be working perfectly fine!

:tada: That's great to hear! If you're publishing these other crates somewhere, we should definitely list them in the README on this repo too!

code-elf commented 3 years ago

Okay, I've updated the docs for my changes! Let me know whether they're fine that way!

My two map implementations are honestly too specific to be published - but that's really what's so awesome about left_right; I now have a file with no more than 200 lines containing two map implementations doing exactly what I need them to do.

jonhoo commented 3 years ago

Would you mind not force-pushing until after we've resolve the comments? GitHub gets all confused πŸ˜…

EDIT: Ah, looks like you addressed the comments they just weren't marked as resolved. Ignore me :) I'll do final review after work today, but I think it's looking quite good now!

code-elf commented 3 years ago

My bad, sorry for the confusion! I only marked those as resolved where I used your suggestions directly - I figured the other three needed another look from you.

jonhoo commented 3 years ago

The clippy error is because it pulls down left-right 0.9, which doesn't have sync_with, rather than using the path dependency. I'll fix that up.

jonhoo commented 3 years ago

Thank you! Will push a new release of both soon.