In Xorbits, storage is used to store intermediate data and final results during the computation process, supporting various types of storage such as GPU memory, main memory, and disk. Currently in the Xorbits, data produced by workers is stored and managed by a cetralized storage_api.
This project hopes to introduce peer-to-peer data storage and communication, where each Xorbits worker hold their own data locally. A meta_api maintains the keys of data and the address of the worker that produced this data. Each subtask runner holds an independent RunnerStorage to maintain all data created in this runner and respond to requests for data (if it has). When a runner needs a non-local data, it looks up the meta_api and finds the address of the runner that holds the data, and then fetches the data. Thus, a centralized data storage is no longer necessary, which may bring potential speed accelerance.
What do these changes do?
This is a proof-of-concept draft pull request.
In Xorbits, storage is used to store intermediate data and final results during the computation process, supporting various types of storage such as GPU memory, main memory, and disk. Currently in the Xorbits, data produced by workers is stored and managed by a cetralized
storage_api
.This project hopes to introduce peer-to-peer data storage and communication, where each Xorbits worker hold their own data locally. A
meta_api
maintains the keys of data and the address of the worker that produced this data. Each subtask runner holds an independentRunnerStorage
to maintain all data created in this runner and respond to requests for data (if it has). When a runner needs a non-local data, it looks up themeta_api
and finds the address of the runner that holds the data, and then fetches the data. Thus, a centralized data storage is no longer necessary, which may bring potential speed accelerance.Check code requirements