rust-crdt / rust-crdt

a collection of well-tested, serializable CRDTs for Rust
Apache License 2.0
1.32k stars 58 forks source link

Expose `BigRational` in `Identifier`. #133

Closed TTWNO closed 1 year ago

TTWNO commented 1 year ago

To find where an operation has been inserted when using a List structure is currently like the following:

let mut list = List::new();
// ... do many operations from various actors: 'A', 'B', 'C'
let operator = list.insert_index(12, ' ', 'A');
list.apply(operator.clone());
let latest_inserted_at: Vec<usize> = list.iter_entires()
    .enumerate()
    .filter_map(|(idx, (ident, v))| if ident == operator.id() {
        Some(idx)
    } else {
        None
    })
    .collect();

Since I want to convert these inserts from an atomic Op<T, A> type to a specific insertion made for the current client, it would be much more efficient if I could start searching from the approximate location of the insert. For example, I could receive the approximate location of the inserts with Identifer::ratio() -> BigRatio, then it would be fairly simple to start looking for the right identifier from that index going forwards and backwards simultaneously to find the right insertion point.

let mut list = List::new();
// ... do many operations from various actors: 'A', 'B', 'C'
let operator = list.insert_index(12, ' ', 'A');
let rough_insertion_point = operator.ratio().to_integer();
let id = operator.id();
list.apply(operator.clone());
let latest_inserted_at: usize = [0..].iter()
    .find_map(|offset|
        let pos_idx = rough_insertion_point + offset;
        let neg_idx = rough_insertion_point - offset;
        if list.inter_entries().nth(pos_idx) == id {
            Some(pos_idx)
        } else if list.iter_entities().nth(neg_idx) == id {
            Some(neg_idx)        
        } else {
            None
        });

This allows me to search a much smaller space on large Lists. That assumes that this will stay relatively stable over time, if it drifts towards large Identifiers, with many items, then this will appear to be more difficult. I could also transmit additional data with the Op between each node, indicating the position it was inserted at on Node1's side, then search from that position on Node2's side as if I have been given the BigRational like the above.

Please let me know if that is a better solution to this problem, or if exposing the BigRatio makes sense.

Thanks a bunch for all your hard work! I super appreciate it!

EDIT: Another option is to have a position_entry() method on List, since that would also allow a lookup to associate an item in the list with its Identifier.

davidrusu commented 1 year ago

Hmm, I'm thinking your last suggestion to add a position_entry(id: Identifier) -> Option<usize> is probably best in terms of perf and ease of use since we get to take advantage of the fact that List::seq is a BTreeMap under the hood.

A simple impl would be something like:

fn position_entry(&self, id: &Identifier) -> Option<usize> {
    let before_id = self.seq.clone();
    let after_id = before_id.split_off(id);
    if after_id.first_key_value().map(|(i, _)| i) == Some(id) {
        Some(before_id.len())
    } else {
        // `id` is not in the sequence
        None
    }
}

That seq.clone() is not ideal, but the split_off() should be log(n) time if I understand the code correctly.

Alternatively, we can look at using the self.seq.range(..id) in conjunction with the Range::size_hint to get the count of entries before id, this would avoid the clone but I can't determine if size_hint is O(n) or if they have some optimizations in place to get the count of the range iterator.

Another option is self.seq.range(..id).count() to get the index of the id.

I'd be happy with the clone + split_off approach for now if the other approaches don't seem clear cut.

Would you have time to put this up in a PR?

TTWNO commented 1 year ago

Certainly! Here's an O(n) solution: https://github.com/rust-crdt/rust-crdt/pull/134