Open kurman opened 2 years ago
From offline discussion my understanding is that DataStore is intended to support hive paths though I think we probably want to support other things as well such as:
Some example data paths that might come up:
For DataStoreValueFilter is that sufficient to handle arbitrary subfilters?
I'm wondering if it might be better to go the route of DeviceMounts and use Python inheritance w/ a DataStore
base class. That does have issues for JSON serialization though
https://github.com/pytorch/torchx/blob/main/torchx/specs/api.py#L306
Description
Add a support for data store information in
torchx.specs.api.Role
with the following hierarchy:Motivation/Background
It is computationally efficient to allocate compute resources that use persistent data closer to storage location. For example Slurm can have a multi-cluster configuration and AWS has geographic regions that incur data transfer cost between regions.
Detailed Proposal
In order for scheduler to select a preferred cluster/region, it is required for the client code to provide this information upfront in order to allocate jobs to use the right resources. Implementation can be done either in the scheduler wrapper or pass the information to actual scheduler if it supports the operation.
API assumes use of widely adopted Hive Data Model, primarily use of:
where all the data within a partition is collocated.
Further, to select a right partitions the proposed API includes provides mechanism for selecting specific partitions either by a specific value or a range mechanism.
Alternatives
Instead of adding the changes to API, it is possible to build custom components that are data and region aware based on specific needs.