ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.15k stars 5.61k forks source link

[core] Better management of GCS client in CoreWorker. #35684

Open fishbone opened 1 year ago

fishbone commented 1 year ago

What happened + What you expected to happen

GcsClient is not managed well in CoreWorker which make the management of GCS connection hard. Right now, we have GcsClient in Pubsub/CoreWorker/Py. Although after migration to cpp based GCS client, the creation logic is randomly distributed in the cluster.

A hot fix is to create a singleton globally and reuse it https://github.com/ray-project/ray/pull/35624

This works in some way, but it doesn't gives us the flexibility to shutdown the GCS client channel.

A better way is to create the channel in a centralized way and pass this arounds to other endpoints.

Ideally, all GCS based service should be initialized in CoreWorker and just be passed to python with some cython API.

When CoreWorker is shutdown, we should just shutdown everything.

Related issue: https://github.com/ray-project/ray/issues/35681

Versions / Dependencies

master

Reproduction script

In code.

Issue Severity

None

jovany-wang commented 1 year ago

Goooood