Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Ray has significant overhead when running nested remote function for first time.
In the chart above, blue, green and purple bar stands for the time used by the actual functions, orange and red is the time spent by Ray in between. For the first batch of tasks, Ray takes a lot of time to start each nested remote function. For later tasks, such overhead is more rare.
Analysis
Since there are more tasks than CPUs, the top layer function consumes all CPUs, and Ray have to start new worker process for second and third layer functions.
The logs show that most of the time is spent between the worker starting a new worker process and the new worker process initializing CoreWorker and connect itself to raylet. However, Ray spend much less time creating worker processes for the first layer functions.
Versions / Dependencies
Ray 2.9.3
Python 3.8.0
OS: Ubuntu 18.04.1 LTS
Reproduction script
import ray
import time
from datetime import datetime
import json
@ray.remote
def sleep_iter(iter: int):
start_time = datetime.now()
time.sleep(0.3)
end_time = datetime.now()
iter = iter - 1
if iter > 0:
times = ray.get(sleep_iter.remote(iter))
return [start_time.timestamp() * 1000, end_time.timestamp() * 1000] + times
return [start_time.timestamp() * 1000, end_time.timestamp() * 1000]
num_tasks = 100 # larger than number of cpus in the cluster
tasks = [sleep_iter.remote(3) for _ in range(num_tasks)]
ray.get(tasks)
What happened + What you expected to happen
Symptom
Ray has significant overhead when running nested remote function for first time.
In the chart above, blue, green and purple bar stands for the time used by the actual functions, orange and red is the time spent by Ray in between. For the first batch of tasks, Ray takes a lot of time to start each nested remote function. For later tasks, such overhead is more rare.
Analysis
Since there are more tasks than CPUs, the top layer function consumes all CPUs, and Ray have to start new worker process for second and third layer functions. The logs show that most of the time is spent between the worker starting a new worker process and the new worker process initializing
CoreWorker
and connect itself to raylet. However, Ray spend much less time creating worker processes for the first layer functions.Versions / Dependencies
Ray 2.9.3 Python 3.8.0 OS: Ubuntu 18.04.1 LTS
Reproduction script
Issue Severity
Low: It annoys or frustrates me.