What does this PR do?
This is first step towards improving resiliency and performance in Ray without modifying the source code. This PR includes a new tool that helps configure Ray cluster conveniently. The tool helps in fetching and parsing ray configurations, and generating resiliency profiles (e.g., strict, relaxed, recommended). Currently, we are working on deciding configuration options for each resiliency profile manually by evaluating them on various ray workloads. We'll update this PR accordingly.
Description of Changes
The changes in this PR is currently independent of the main codeFlare code. We intend to put this tool in a new folder called utils in the codeFlare root directory.
What does this PR do? This is first step towards improving resiliency and performance in Ray without modifying the source code. This PR includes a new tool that helps configure Ray cluster conveniently. The tool helps in fetching and parsing ray configurations, and generating resiliency profiles (e.g., strict, relaxed, recommended). Currently, we are working on deciding configuration options for each resiliency profile manually by evaluating them on various ray workloads. We'll update this PR accordingly.
Description of Changes The changes in this PR is currently independent of the main codeFlare code. We intend to put this tool in a new folder called
utils
in the codeFlare root directory.