xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.io
Apache License 2.0
1.06k stars 67 forks source link

POC: Tempting to introduce peer-to-peer chunk data exchange #728

Open jqdai opened 9 months ago

jqdai commented 9 months ago

What do these changes do?

This is a proof-of-concept draft pull request.

In Xorbits, storage is used to store intermediate data and final results during the computation process, supporting various types of storage such as GPU memory, main memory, and disk. Currently in the Xorbits, data produced by workers is stored and managed by a cetralized storage_api.

This project hopes to introduce peer-to-peer data storage and communication, where each Xorbits worker hold their own data locally. A meta_api maintains the keys of data and the address of the worker that produced this data. Each subtask runner holds an independent RunnerStorage to maintain all data created in this runner and respond to requests for data (if it has). When a runner needs a non-local data, it looks up the meta_api and finds the address of the runner that holds the data, and then fetches the data. Thus, a centralized data storage is no longer necessary, which may bring potential speed accelerance.

Check code requirements