uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.38k stars 89 forks source link

Information about load balancing/resource allocation #276

Open eljost opened 1 year ago

eljost commented 1 year ago

Dear developers,

the README.md states that

pathos also provides a very basic automated load balancing service,
as well as the ability for the user to directly select the resources.

Is this documented somewhere? Grepping in the pathos repo for terms like (load|balanc|resource|cost) did not bring up any results regarding my question.

Does pathos support something like balancing tasks over workers, when each task has an associated cost?

All the best Johannes

mmckerns commented 1 year ago

I'm not sure what exactly you are looking for. What pathos provides is different strategies/options for structuring your computations on available resources. So, you can build and configure a hierarchy of maps (and etc) that use threads, processes, MPI, schedulers, and etc. The interaction with MPI and schedulers have been pushed into pyina and interaction with threads and processes are in multiprocess. For example, with pyina, there are options for how you want to structure the workers (i.e. split a task list evenly between the workers or use a pool of workers).

eljost commented 11 months ago

Thansk for getting back to me so quickly. I was looking for something along the line as in Dask.distributed:

There I can create a LocalCluster with a worker that has certain (abstract) resources, e.g., CPU cores:

cluster_kwargs = {
    # One worker per cluster with multiple threads
    "threads_per_worker": ncores,
    "n_workers": 1,
    # Register available cores as resource
    "resources": {
        "CPU": ncores,
    },
}

By specifying the ressource use of a task I can submit tasks that use different amount of CPU cores.

client.submit(
    func,
    image,
    resources={
        "CPU": pal_parallel,
    },
)

Given a set of 10 tasks on a machine with 8 CPU cores, this way I can submit the first 8 tasks with "CPU": 1 and the remaining two tasks with "CPU": 4, to make efficient use of my available cores. Dask prevents any oversubscription, e.g. the first task with "CPU": 4 will only ever start, after 4 jobs with "CPU": 1 finished.

Is something easily possible with pathos, or do I have to do the resource management by hand/manually? If this is not possible it is perfectly fine; then I'll close this issue.

All the best Johannes