Closed ericl closed 1 year ago
Current prototypes of shuffle in Ray run 100TB-scale jobs at 70%+ hardware efficiency.
Any reference on the hardware efficiency? how to define it?
Is there an actionable tasks list for those Beta -> GA features. We are trying to understand the current limitations and see if we can participate in moving them into the maturity?
Any reference on the hardware efficiency? how to define it?
I believe this was defined with respect to the aggregate disk performance of the cluster (cc @stephanie-wang ), which was the bottleneck.
Is there an actionable tasks list for those Beta -> GA features. We are trying to understand the current limitations and see if we can participate in moving them into the maturity?
For Datasets you can follow the P1/P2s in the milestone: https://github.com/ray-project/ray/milestone/53; otherwise we'll be posting REPs which will contain more details. Are there particular questions you have?
State Observability Resource utilization observability is needed for finer pricing model.
Any timeline estimation for aplpha,beta,rc and final releases ? @ericl
@tianlinzx thanks for the question! Roughly speaking, we plan to have a community preview around the end of July, though many of the features will land in master in some form before then. The actual release will be Aug 21.
Hi @ericl , as far as I know, the object store of ray a.k.a Plasma does not support HBM yet, whereas it does not support RDMA, NVLink, etc. Is there any plan for that or maybe there is some related work already done? Thank you in advance.
Hi @ericl , as far as I know, the object store of ray a.k.a Plasma does not support HBM yet, whereas it does not support RDMA, NVLink, etc. Is there any plan for that or maybe there is some related work already done? Thank you in advance.
Hey @kuizhiqing , we don't have a concrete timeline but are very interested in requirements here (cc @clarkzinzow @scv119 ). What are your use cases and needs for plasma HBM?
Btw, some functionality of this category is available in the ray.util.collective library for use at the application level: https://docs.ray.io/en/latest/ray-more-libs/ray-collective.html
Ray's scalability is only tested up to 1,000 nodes and 10,000 actors.
Thanks. Are there any reference documents or tests on scalability?
@denkensk there is a benchmark directory for each ray releases https://github.com/ray-project/ray/blob/ray-1.12.0/benchmarks/README.md
Is there any Ray 2.0 beta? Would like to test our apps with that before the final release comes out.
Hi @viksharma1987 yes we are working on that plan and will share with the community soon!
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Closing this issue as Ray 2.0 was released
What is Ray 2.0?
Ray 2.0 (ETA: August this year) will improve on the Ray 1.x line in three major ways:
The following is a list of current proposals we have for 2.0. For each feature on the list we will later publish a REP (with design details) for community feedback here: https://github.com/ray-project/enhancements
Please feel free to comment in this thread, as well as propose new REPs for 2.0!
Ray Core
State Observability
Observability has been a common pain point for Ray users. In particular, Ray currently lacks a unified API for inspecting the entities that together comprise a running Ray application (e.g., tasks, actors, nodes, placement groups). We plan to add APIs that will provide access to a core list of system states in the CLI, REST, and UI. This API will cover Ray tasks, actors, objects, placement groups, nodes, resources, runtime envs, and namespaces.
Cluster Fault Tolerance
Ray's fault tolerance model is currently designed for batch jobs, but serving workloads require greater degrees of availability within a single cluster. We plan to add support for failover of the Ray's global control store (GCS). This means that jobs can continue running even if the GCS fails, though certain operations like launching new tasks or actors will be unavailable until the GCS recovers (a few seconds to minutes depending on the cluster manager).
For Ray Serve users, this means single-cluster deployments can continue serving user traffic uninterrupted during GCS failover. For Serve users that are already using multi-cluster HA (i.e., load balancing traffic across multiple Serve clusters), GCS fault tolerance will reduce the frequency of single-cluster failures that can cause latency spikes / overloads even in a multi-cluster setup, and reduce the amount of resources that need to be provisioned.
Scalable Data Shuffle
Shuffle is an important primitive for ML preprocessing and training use cases. We plan to scale Ray's data shuffle support to large clusters. Current prototypes of shuffle in Ray run 100TB-scale jobs at 70%+ hardware efficiency. In Ray 2.0, we plan to productionize these prototypes and integrate them with the Datasets sort / repartition / shuffle APIs to bring scalable shuffle to Ray library users.
Implementation-wise, this effort will involve (1) adding more advanced shuffle algorithms to Datasets (e.g., Cosco, pipelined shuffle), and (2) optimizing and hardening the Ray dataplane to operate performantly at this scale (e.g., reducing metadata overheads and improving object transfer protocols).
Ray AI Runtime
In 2.0, we're aligning Ray's existing ML libraries to work together as the first third generation AI runtime. For ML platform builders, this runtime will provide a unified compute layer for ML apps and services. For individual library users, this effort will improve interoperability between Ray libraries and between Ray and third-party ML libraries.
See [RFC] Introducing Ray AI Runtime for more information. This effort also includes a number of standalone improvements to Ray libraries:
Serve Pipelines GA
Serve Pipelines is a feature of Serve built to aid the development and deployment of multi-models inference pipelines, also known as model composition. Pipelines will enable users to compose and test dynamic, multi-model pipelines using familiar Ray task and actor primitives, and then use Serve to deploy these same pipelines to production for low-latency serving of user traffic.
Serve JSON API
In addition to its Pythonic API, we plan to add an equivalent JSON-format API to Serve, which can be used to declaratively update deployments and pipelines via CLI or REST request. This API improves the MLOps CI/CD story for Serve by adding a pathway to inspect and update Serve deployments without needing to execute Python code.
Train GA
As part of the Ray AI Runtime, we plan to GA the Ray Train library. Ray Train interoperates with Ray Tune to tune your distributed model and Ray Datasets to train on large amounts of data, and will become the standard integration point for 3rd party ML training frameworks (e.g., XGBoost, PyTorch, HuggingFace, etc.) to run on Ray.
RLlib API Improvements
We plan two major API improvements to RLlib in 2.0:
First, "RLlib Connectors" will extract the common observation preprocessing code connecting env and RLlib models into a stand-alone connectors library. This will make the Model to Env interfacing code RLlib has more transparent and understandable for users needing to handle environments with complex action and observation spaces.
Second, we plan to refactor the "execution plans" of RLlib algorithms to follow a more imperative pattern, while preserving its distributed functionality. At a high level, this means that algorithm engineers will be able to express the distributed execution of algorithms in more familiar imperative patterns rather than working with distributed result iterators.
Ray Clusters
KubeRay GA
KubeRay is the official way to run Ray applications on Kubernetes. In Ray 2.0, KubeRay will reach GA. See the Kuberay Project for more details on its roadmap: https://github.com/ray-project/kuberay
Job Submission API GA
The Ray job submission API enables users to package code for execution in Ray clusters. This API will reach GA in 2.0, providing a standard for external Job managers to interact with Ray clusters: https://docs.ray.io/en/latest/ray-job-submission/overview.html
Ray 2.1 and beyond
There's a lot more we want to do for Ray. A few items we are considering for 2.1 and beyond include:
Workflows GA
We plan to eventually GA the Ray workflows library. This will allow ML applications that only require lightweight orchestration to be built entirely within Ray, improving performance and reducing operational complexity. Key elements of GA include (1) updating the workflows API to align with
ray.remote
syntax instead of having a separate decorator, (2) stabilizing the storage API, and (3) adding production features such as limiting the number of concurrent workflows.Scalability to 2k+ nodes and 100k+ actors
Currently, Ray's scalability is only tested up to 1,000 nodes and 10,000 actors. We plan to address bottlenecks preventing applications requiring more resources than this from being run on Ray.
Pyarrow Compute for Datasets
We plan to leverage the pyarrow compute library (or other accelerated libraries like polars) more heavily in Ray Datasets, which will accelerate operations such as aggregations commonly used for ML preprocessing.
Placement Group API Improvements
To make the placement groups API easier to use, especially for more challenging use cases, we plan to add a higher-level gang scheduling API. This means that instead of creating a placement group and then scheduling actors to the bundles within the group, users would be able to create a set of actors atomically as a group with given scheduling constraints.
Please let us know your feedback below!