rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.45k stars 12.6k forks source link

Add atomic types to libraries #5042

Closed brson closed 11 years ago

brson commented 11 years ago

We need better support for atomic types. At the moment we don't even have a proper way to do atomic loads and stores. I recommend basing this on the C++11 atomic types, since that is what LLVM is tailored for.

Things that we need most are Atomic<int>, Atomic<uint>, and Atomic<~T>.

Aatch commented 11 years ago

I'm currently looking at implementing this.

How close to the C++ atomics is ok? Given the similarities between Rust and C++, it is possible to almost exactly replicate <atomic>. You can't do stuff like overload assignment (or assignment-op), but otherwise I think it is possible.

The advantage of mimicking the C++ api is that it simultaneously exposes all the memory ordering options and means that the semantics of the orderings are the same between the two languages, which helps people coming to Rust from C/C++.

Finally, since atomic variables are only really useful when they are shared, mutable objects, they should be able to be passed between tasks as shared data, much like ARC. However ARC has more overhead than the actual data stored in an atomic variable. I'm not sure what, if any, changes might need to be made to get something like Atomic<int> and friends working without making the api confusing and/or inefficient.

Aatch commented 11 years ago

Ok, so I did more investigation. While it seems to be mostly possible, it's a bit fragile, unless you make sure, some how, that all of the tasks that can see the atomic variable(s) are finished before freeing it, you get some very odd behaviour and fairly frequent segfaults.

I don't know enough about Rust's internals regarding memory management to suggest a possible solution, so I would welcome some guidance.

Looks like this should probably wait until more happens with #553, although I think that this is actually part of #553 anyway.

brson commented 11 years ago

Mimicking C++ is appropriate I think. They've had a lot of experience with this and we are building on their memory model.

I think we can start with an atomic type, like Atomic<int> or AtomicInt and implement the atomic operations on it without enforcing any specific way of sharing that value. It could be shared safely or unsafely depending on context.

As an example, in the scheduler we will probably have a shared data structure like:

struct KernelData {
    task_count: Atomic<int>,
    schedulers: [AtomicRef<SchedulerHandle>>, .. MAX_SCHEDS],
    etc.
}

The entire KernelData structure is shared in an AtomicRef<KernelData> type, but all the interior values have their own synchronization scheme.

(AtomicRef here is a synonym for the type currently named SharedMutableState)

I don't think this depends on globals since there are other ways to share memory.

Aatch commented 11 years ago

@brson It doesn't depend exactly on globals, but while working on a proof-of-concept, I found some fairly obvious issues.

This is some code that uses a theoretical atomic type, it's based on some actual code I have, but the implementation isn't important.

fn main() -> () {
    let atom1 = atomic_init(&0);

    for [0, ..5].each |_| {
        do task::spawn || {
            for [0, ..5].each |_| {
                atom1.add(2);
                io::println(fmt!("[%d] Var is %d", *(task::get_task()), atom1.load());
            }
        }
    }
}

The issue here is with the fact that atom1 is copied into each task, but is really a reference to the underlying data (since I am assuming that the point of having atomic variables is that they are thread safe?). Now, my toy implementation, with this code, seems to work fine. Each task sees gradually increasing values and no task go backwards.

However, if I turn optimizations on (rustc -O), then the output I get is quite different. the first lines seem but then the value shoot ups to some random, high value. If I add this line: while atom1.load() < 50 {task::yield()} to the end of main then the issue goes away.

The memory for atom1 is allocated on the stack, and the stack is freed before the tasks all finish, this is because the scheduler waits for the tasks after main has finished. While I don't know what the optimizations are, running the program through valgrind confirms it.

While in and of itself, this isn't a problem, it does highlight the fact that we have no way of knowing the lifetime of atomic variables and therefore no assurance that it won't be pulled out from underneath the task. Obviously I have demonstrated a way around this, but it doesn't detract from the fact that there is no way (to my knowledge) to store data in a way that allows you be sure that the end of some function won't free the underlying data, without using locks and mutexes anyway (thus defeating the entire point).

In C++, between global variables and manual memory management it is fairly trivial to be certain that a value is still alive when you access it, as long as you don't do things that would cause invalid memory accesses anyway, you're pretty safe. Unfortunately, no matter where the data for an atomic variable is store, rust will almost certainly free it, thinking that you are done with it.

While I understand that this may not be an issue in practical programs, I think there needs to be at least something to account for this, as it is very easy to hit this issue.

Otherwise, I happy to implement something that looks similar to the above, preferably without needing to pass a pointer to the type at all.

Edit: As a final note, this is my first real dive into the language like this, so I may be missing something obvious. If so, please let me know.

asb commented 11 years ago

It might also be worth looking at the proposed libuv portable atomics API: https://github.com/joyent/libuv/issues/726

Aatch commented 11 years ago

@asb, not really that useful. The atomicity is at the processor level (and a little at the compiler level) so it's not like we need any code to be thread/task aware.

LLVM has atomic versions of store/load, CAS and RMW, which is more than enough, and being LLVM it means that it's mostly platform independent. running through libuv functions would just add an unnecessary level of complexity when it's easier just to have the LLVM instructions generated directly.

asb commented 11 years ago

@Aatch I linked it more for the ability to specify the desired memory model but now I see that's just copied from the C++11 API. Of course, it's arguable whether exposing this is even useful and whether rust would be better off to just stand on the (at least in comparison to the others) easy to reason about sequential consistency model.

Aatch commented 11 years ago

So I just looked at the implementation that clang uses for atomics, trying to get some idea of how to implement it and they actually handle atomics as a special object/expression. I'm not sure if that's required for us, but it does pose the interesting question of whether or not we want to make atomic types special in some way.

Advantages

Disadvantages

I would appreciate feedback on this, since I am having trouble coming up with a design that doesn't require either creating ~30 intrinsics or changing a significant about of the compiler.

brson commented 11 years ago

@AAtch your atomic_init takes a reference to the value it atomically manages and then appears to be copyable. I would suggest making the atomic type cointain its value and be noncopyable. On it's own that is safe and it leaves the policy decision about how to share the atomic type up to someone else.

I haven't read your latest post yet. Will get back to you later.

Thiez commented 11 years ago

@Aatch I'm also interested in this issue, are you still working on this?

Aatch commented 11 years ago

This is essentially done. I'll open issues for the remaining work.

nox commented 10 years ago

What about an AtomicCLike?

Thiez commented 10 years ago

What about it?

nox commented 10 years ago

Well it would be useful for simple C-like enum types, right?

Thiez commented 10 years ago

Perhaps. Once #15620 is solved that would probably be easy.