wvwwvwwv / scalable-concurrent-containers

High performance containers and utilities for concurrent and asynchronous programming
Apache License 2.0
285 stars 14 forks source link

Possible bug halting scan method when holding references to entries. #129

Closed Demy076 closed 4 months ago

Demy076 commented 5 months ago

I know the library has the intention to be lock free. Now, this could be considered my doing and the library doesn't cover this.

Describe issue: The issue is when I hold a reference of an occupied entry in code that hasn't been dropped by the compiler. Whilst scanning the full entry using scan() method will block, because the reference to occupied entry hasn't dropped yet. I have published the code beneath. I didn't know until I dropped the issue as to my previous experience working with regular HashMap from std library.

I think this could be a possible bug or if it's just something that the library doesn't cover.

` use scc::HashMap; use std::sync::Arc;

[derive(Clone, Debug)]

struct EntryMf { key: u32, value: String, } struct Person { name: String, todos: Arc<HashMap<u32, EntryMf>>, }

impl Person { fn new(name: String) -> Self { Self { name, todos: Arc::new(HashMap::new()), } } async fn add_todo(&self, id: u32, todo: String) { let amount_tasks = 10; let mut tasks = vec![]; for i in 0..amount_tasks { let todo = todo.clone(); let id = id + i; let todos = self.todos.clone(); tasks.push(tokio::spawn(async move { todos .insert( id, EntryMf { key: id, value: todo, }, ) .unwrap(); println!("Added todo: {}", id); })); } for task in tasks { task.await.unwrap(); }

    // Mutate the 2nd element
    let mut retrieve_mut_slice = self.todos.get(&2).unwrap();
    println!("Mutating 2nd element");
    retrieve_mut_slice.get_mut().value = "Mutated".to_string();
    // drop(retrieve_mut_slice); //Uncommenting this will make the code work
    println!("Mutated 2nd element");
    println!("COunt of elements: {}", self.todos.len());
    let mut values: Vec<EntryMf> = vec![];
    let all_values = self.todos.scan(|_, v| {
        println!("Value: {:?}", v);
        values.push(v.to_owned());
    });
    println!("Map: {:?}", all_values);
}

}

[tokio::main]

async fn main() { let person = Person::new("John".to_string()); person.add_todo(1, "Buy milk".to_string()).await; } `

Demy076 commented 5 months ago

For some reason even with code highlighting github doesn't mark it...

wvwwvwwv commented 5 months ago

Hi!

It’s by design that scc::HashMap is not lock-free as documented. scc::HashIndex provides several lock-free read methods, so I’d advise you use scc::HashIndex if iterating over entries is frequent.

Demy076 commented 5 months ago

Hi there,

Yes, which is true I saw it on docs.rs just yet. Unfortunately, I require mutability a lot of time during the course of this project I am working on. So, I think it's best if I use scoped {} for this so rust knows when to drop the variable.

I wouldn't possibly know how to else utilize parallel reading with mutability. As those methods for HashIndex are rendered unsafe. It's a websocket based tracker that uses a hashmap now SCC hashmap as it's storage in the upcoming version of it. If u have any tips that would be nice.

Operations are done as followed:

Insert & remove (usually). Read (a lot of them, since it communicated with another service that sends message so it's being read and written from anywhere.) Now I use a standard HashMAP. Updating existing values inside key by getting a mutable reference (which is ideal rather than reinserting the existing key)

wvwwvwwv commented 5 months ago

Hi again!

It looks like that you need reading data while others are maybe modifying it; which is strictly prohibited by Rust if not protecting it with a mutex or the like. That’s why those HashIndex methods are marked unsafe.

If you are sure that the data is safe to read while being modified, then you can consider using scc::HashIndex<K, Arc>, getting mutable references of V using unsafe Arc methods like get_mut_unchecked.

Demy076 commented 5 months ago

Hi there, yes that seems to work to some degree but not what I was looking for. I will just make sure lifetimes are managed properly with {} as that seems to go perfect.

Another question that's a tad bit irrelevant. I checked out HashSet loving it. But I do lack the methods: entry & get. Are they not built in or just not part of the concurrent hash set?

wvwwvwwv commented 5 months ago

I didn’t implement them because I thought that they were primarily needed for value modification. However, I have just realized that they can be used to maintain ownership of certain entries in the code. -> I’ll add them when I have time; unfortunately, I’m super busy these days…

Until then, you can try HashMap<K, ()> since HashSet is merely an alias of it.

Demy076 commented 4 months ago

True, in the meanwhile I’ll fork this repo and implement it and push so u can see if the implementations are worth it. Is it possible for me to add a values() and keys() to Hashmap? Given that u said no iterators this kinda comes close to it. It isn’t an iterator per say just a cloned vector at this point.

wvwwvwwv commented 4 months ago

Not sure if values() and keys() returning a Vec make sense since the same can be simply implemented using the scan method; I’d recommend that you implement those method outside this crate, e.g., as a helper function. On the other hand, you are welcome to implement the Entry API on your own and make a PR :-)

Demy076 commented 4 months ago

If you’re so busy, I’ll fork it make a PR and let u go over the changes. If I’m gonna do it, I’ll do it correct this instance and see how it works out. Helper functions sound nice but if I can integrate it into the library + finish some of the hashset methods that’d be great.

wvwwvwwv commented 4 months ago

OMG, I just spared some time to look into it, and figured get -> Option<OccupiedEntry> and entry -> Entry are already implemented! I'll just close this issue :-)