ssec-jhu / dplutils

Distributed(Data) Pipeline Uitilities
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Ray stream graph executor ignores overloaded configuration #67

Closed amitschang closed 4 months ago

amitschang commented 4 months ago

For example

task1 = PipelineTask('task1', lambda x: x, num_cpus=16)
next(RayStreamGraphExecutor([task1]).set_config('task1.num_cpus', 8).run())

does not set the num_cpus in ray remote task to 8, still requires 16, as in it could result in:

2024-03-28 13:23:50,928 INFO worker.py:1743 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8266 
(autoscaler +1m21s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
(autoscaler +1m21s) Error: No available node types can fulfill resource request {'CPU': 16.0}. Add suitable node types to this cluster to resolve this issue.

The issue seems to be setting up the remote tasks in initialization at: https://github.com/ssec-jhu/dplutils/blob/main/dplutils/pipeline/ray.py#L167-L176