ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.11k stars 5.6k forks source link

Ray log tracing #9786

Open matter-funds opened 4 years ago

matter-funds commented 4 years ago

Consider the following situation:

Something (incoming request, function call etc) launches job A, which launches jobs B and C in parallel. Job C also launches job D.

I would be useful for the logging statements emitted by these processes to include the same request_id, so one can easily track how a request triggered jobs through the cluster.

This can (sort of) be achieved now, if the request_id is threaded through all calls, but this is cumbersome to say the least.

If I may suggest two options that would probably work for the logging use-case:

  1. Have ray workers be aware of the context they're in / who launched their job. The request_id can be part of this context.
  2. Expose an API along the lines of:
    child_jobs = ray.get_children(root_job_id) #this gives a list of all jobs that were spawned by the root_job_id (or spawned by the children jobs etc)

    If logs can be matched to jobids, then this API can be used to trace all the jobs spawned by a particular request.

Feature request opened at @rkooo567 's suggestion

rkooo567 commented 4 years ago

cc @edoakes I think this is another use case for the task context we discussed before.

rkooo567 commented 4 years ago

Hi, @matter-funds I will start working on this (highly likely) from next week. Would you mind having a short conversation in the beginning of next week? If so, can you give me your slack handle for public Ray channel?

rkooo567 commented 3 years ago

This must be somewhat possible now with runtime_context API. I will document this method in 1.0.1.

scottsun94 commented 1 year ago

@rkooo567 Has this been fixed?