rayon-rs / rayon

Rayon: A data parallelism library for Rust
Apache License 2.0
10.98k stars 500 forks source link

Use an user-provided "backend" (threadpool, event loop, ...) #93

Open gnzlbg opened 8 years ago

gnzlbg commented 8 years ago

Is it possible to tell rayon to use an user provided thread-pool (e.g. in my crate's main function)?

For example if I'd had a crate using tokio where I also want to use rayon, I would like to have a single thread-pool/event loop/task manager... that is used as the backend for both (and that does workstealing for both), instead of two competing ones.

nikomatsakis commented 8 years ago

It is not currently possible, but one of my big To Do items for rayon is to factor out the backend into a distinct crate.

iduartgomez commented 7 years ago

From looking at the source, I wonder if as an alternative (or in addition/ temporal workaround), it would be possible to have a finer control of the available number of threads for execution at a given time (ie. setting it up before running a parallel iterator chain) during runtime though a function call (that would in turn modify the Registry).

This would be really helpful in the context of load balancing in an application where you have several live threads competing over resources with different pools (other than rayon), for example. Is a bit hackish, but still something.

(edit: NVM, cleared up by cuviper The ideal solution would be IMO to have the same functionality provided by the global thread pool provided by instantiated ThreadPool types so we would have total control over how we split resources, a good steep would be to have something more akin https://github.com/frewsxcv/rust-threadpool incorporated to rayon).

cuviper commented 7 years ago

The ideal solution would be IMO to have the same functionality provided by the global thread pool provided by instantiated ThreadPool types so we would have total control over how we split resources

What do you mean by this? Rayon's global thread pool is literally just a global ThreadPool instance. You can ThreadPool::install your parallel iterator into whatever pool you like.

iduartgomez commented 7 years ago

@cuviper Didn't see how to do it in the documentation (maybe is just that, a lack of documentation/example?), but for example:

hash_map
    .par_iter()
    .for_each(|(k, v)| // do stuff ... );

works but when if I use the install method I can't (install closure signature is FnOnce() -> R + Send so does not expect arguments), that's what i was thinking when saying 'have the same functionality'.

Probably missing something obvious.

cuviper commented 7 years ago

The install closure has no arguments, but you can capture anything you like. For instance:

let hash_map = get_map();
my_pool.install(|| {
    // this implicitly captures a reference to `hash_map`
    hash_map.par_iter().for_each(|(k, v)| {
        // do stuff ...
    })
});
nikomatsakis commented 7 years ago

https://github.com/nikomatsakis/rayon/pull/353 makes it possible to have a custom backend for parallel iterators, at least.

alexcrichton commented 6 years ago

FWIW I'm running into this I believe with the wasm32-unknown-unknown target where std::thread::spawn doesn't work but we're able, with wasm-bindgen, to get something that looks similar-ish to thread spawning. In that sense Rayon can't spawn any threads because it won't work, but I can either empower it to spawn threads or give it a pool of threads to draw from.

(just wanted to chime in with another use case!)

nikomatsakis commented 6 years ago

The rustc-rayon fork also adds a "custom main function" -- seems like if you could specify the "spawn thread" function, which @alexcrichton would like, then you could also control the "main". The main difference is that rustc-rayon would also like to know the thread index.

alexcrichton commented 6 years ago

Oh that could work! (I think?)

In wasm we'll for sure have a way to get the thread index

nikomatsakis commented 6 years ago

What I mean is:

I was envisioning that we could give you (on the thread-pool) the ability to specify the spawn function. When spawning threads, we would call the function with an integer (thread index) plus a closure that you are supposed to invoke. The default function would just be |_index, body| std::thread::spawn(body), something like that.

alexcrichton commented 6 years ago

We could get that to work!

cuviper commented 6 years ago

@alexcrichton Ultimately, wasm should just implement std::thread normally, right? In this case, it feels like a custom spawn is just a stopgap measure while wasm works out its thread story.

@nikomatsakis

The default function would just be |_index, body| std::thread::spawn(body), something like that.

Our current spawn loop looks like this:

        for (index, worker) in workers.into_iter().enumerate() {
            let registry = registry.clone();
            let mut b = thread::Builder::new();
            if let Some(name) = builder.get_thread_name(index) {
                b = b.name(name);
            }
            if let Some(stack_size) = builder.get_stack_size() {
                b = b.stack_size(stack_size);
            }
            if let Err(e) = b.spawn(move || unsafe { main_loop(worker, registry, index, breadth_first) }) {
                return Err(ThreadPoolBuildError::new(ErrorKind::IOError(e)))
            }
        }

If we did this with a user's spawn function, we'd need arguments for the name, the stack_size, and the spawn closure, but the last is an FnOnce which makes callbacks awkward. I guess we can also pass the index as you suggest, but I'm not sure what they would use it for.

We could instead define a trait like ThreadBuilder, with methods matching std::thread::Builder as needed. Only the spawn function is really required, and the name and stack_size could probably just be defaulted as no-ops.

alexcrichton commented 6 years ago

@cuviper to me at least it's not actually clear whether std::thread will ever be implementable with wasm. I suspect in the limit of time it will likely happen, but likely for the next few years it will remain unimplemented. The current proposal is too minimal to implement std::thread as-is but there's other future proposals/ideas which may empower it.

I think the ideal interface for wasm would be something along the lines of "rayon, you can control this thread for some time" where a thread sort of opts-in to being a rayon worker thread. Requiring that rayon is still the one to spawn the thread may be too restrictive still, but I don't mind testing out to see if it's the case!

zopsicle commented 5 months ago

I was envisioning that we could give you (on the thread-pool) the ability to specify the spawn function.

I think this would still be rather high level and insufficient for some applications. Interoperability with the Windows thread pool, libdispatch, or other thread pools that have a task submission API, that are also used by OpenMP implementations, MSVC C++' std::async, and many other libraries, would allow all of these tools to submit tasks to the same thread pool and interoperate seamlessly. I'm not sure if rayon::join stealing work when one of the tasks finishes early would work well with that model though.