Open v4if opened 1 year ago
This seems to be data, since it is using dataset to specify resources, which it uses Ray core internally
same issue with me. Did you solve it?
Any update on this @hora-anyscale @clarng?
cc: @xieus
Any progress on this issue? Is the implication that if num_gpus
is defined, that the associated task is constrained to 1 CPU?
@raulchen Any progress on this issue? or any alternative method for fractional gpu and several cpu worker mapping?
@raulchen the GPU utilization is bottlenecked by the num_cpus (currently is 1) for the mapper task, do you have any suggestion?
This is intentional behavior to avoid deadlocks I believe, but there could be workarounds. I'm planning to look into it in July.
Is it possible to use placement_groups here?
I tried:
predictions = ds_val.map_batches(predictor_cls,
scheduling_strategy=PlacementGroupSchedulingStrategy(ray.util.placement_group([{"CPU": 1}, {"GPU": 1}] * num_workers, strategy="PACK") , placement_group_capture_child_tasks=True))
It seems however, that the resources are not available to the actor.
What happened + What you expected to happen
It is not allowed to specify both num_cpus and num_gpus for map tasks. When only num_gpus is specified, num_cpus seems to be specified as 1 by default, actor pending due to insufficient cpu resources. However, gpu computing is often the performance bottleneck of the system. How to increase the concurrency of actors when gpu resources are still available?
ray status
run log
Versions / Dependencies
ray, version 3.0.0.dev0
cluster_resources
Reproduction script
Issue Severity
High: It blocks me from completing my task.