Open gnzlbg opened 8 years ago
It is not currently possible, but one of my big To Do items for rayon is to factor out the backend into a distinct crate.
From looking at the source, I wonder if as an alternative (or in addition/ temporal workaround), it would be possible to have a finer control of the available number of threads for execution at a given time (ie. setting it up before running a parallel iterator chain) during runtime though a function call (that would in turn modify the Registry).
This would be really helpful in the context of load balancing in an application where you have several live threads competing over resources with different pools (other than rayon), for example. Is a bit hackish, but still something.
(edit: NVM, cleared up by cuviper The ideal solution would be IMO to have the same functionality provided by the global thread pool provided by instantiated ThreadPool types so we would have total control over how we split resources, a good steep would be to have something more akin https://github.com/frewsxcv/rust-threadpool incorporated to rayon).
The ideal solution would be IMO to have the same functionality provided by the global thread pool provided by instantiated ThreadPool types so we would have total control over how we split resources
What do you mean by this? Rayon's global thread pool is literally just a global ThreadPool
instance. You can ThreadPool::install
your parallel iterator into whatever pool you like.
@cuviper Didn't see how to do it in the documentation (maybe is just that, a lack of documentation/example?), but for example:
hash_map
.par_iter()
.for_each(|(k, v)| // do stuff ... );
works but when if I use the install
method I can't (install
closure signature is FnOnce() -> R + Send so does not expect arguments), that's what i was thinking when saying 'have the same functionality'.
Probably missing something obvious.
The install
closure has no arguments, but you can capture anything you like. For instance:
let hash_map = get_map();
my_pool.install(|| {
// this implicitly captures a reference to `hash_map`
hash_map.par_iter().for_each(|(k, v)| {
// do stuff ...
})
});
https://github.com/nikomatsakis/rayon/pull/353 makes it possible to have a custom backend for parallel iterators, at least.
FWIW I'm running into this I believe with the wasm32-unknown-unknown
target where std::thread::spawn
doesn't work but we're able, with wasm-bindgen
, to get something that looks similar-ish to thread spawning. In that sense Rayon can't spawn any threads because it won't work, but I can either empower it to spawn threads or give it a pool of threads to draw from.
(just wanted to chime in with another use case!)
The rustc-rayon fork also adds a "custom main function" -- seems like if you could specify the "spawn thread" function, which @alexcrichton would like, then you could also control the "main". The main difference is that rustc-rayon would also like to know the thread index.
Oh that could work! (I think?)
In wasm we'll for sure have a way to get the thread index
What I mean is:
I was envisioning that we could give you (on the thread-pool) the ability to specify the spawn function. When spawning threads, we would call the function with an integer (thread index) plus a closure that you are supposed to invoke. The default function would just be |_index, body| std::thread::spawn(body)
, something like that.
We could get that to work!
@alexcrichton Ultimately, wasm should just implement std::thread
normally, right? In this case, it feels like a custom spawn is just a stopgap measure while wasm works out its thread story.
@nikomatsakis
The default function would just be
|_index, body| std::thread::spawn(body)
, something like that.
Our current spawn loop looks like this:
for (index, worker) in workers.into_iter().enumerate() {
let registry = registry.clone();
let mut b = thread::Builder::new();
if let Some(name) = builder.get_thread_name(index) {
b = b.name(name);
}
if let Some(stack_size) = builder.get_stack_size() {
b = b.stack_size(stack_size);
}
if let Err(e) = b.spawn(move || unsafe { main_loop(worker, registry, index, breadth_first) }) {
return Err(ThreadPoolBuildError::new(ErrorKind::IOError(e)))
}
}
If we did this with a user's spawn function, we'd need arguments for the name
, the stack_size
, and the spawn
closure, but the last is an FnOnce
which makes callbacks awkward. I guess we can also pass the index
as you suggest, but I'm not sure what they would use it for.
We could instead define a trait like ThreadBuilder
, with methods matching std::thread::Builder
as needed. Only the spawn
function is really required, and the name
and stack_size
could probably just be defaulted as no-ops.
@cuviper to me at least it's not actually clear whether std::thread
will ever be implementable with wasm. I suspect in the limit of time it will likely happen, but likely for the next few years it will remain unimplemented. The current proposal is too minimal to implement std::thread
as-is but there's other future proposals/ideas which may empower it.
I think the ideal interface for wasm would be something along the lines of "rayon, you can control this thread for some time" where a thread sort of opts-in to being a rayon worker thread. Requiring that rayon is still the one to spawn the thread may be too restrictive still, but I don't mind testing out to see if it's the case!
I was envisioning that we could give you (on the thread-pool) the ability to specify the spawn function.
I think this would still be rather high level and insufficient for some applications. Interoperability with the Windows thread pool, libdispatch, or other thread pools that have a task submission API, that are also used by OpenMP implementations, MSVC C++' std::async, and many other libraries, would allow all of these tools to submit tasks to the same thread pool and interoperate seamlessly. I'm not sure if rayon::join stealing work when one of the tasks finishes early would work well with that model though.
Is it possible to tell rayon to use an user provided thread-pool (e.g. in my crate's main function)?
For example if I'd had a crate using tokio where I also want to use rayon, I would like to have a single thread-pool/event loop/task manager... that is used as the backend for both (and that does workstealing for both), instead of two competing ones.