Closed sighingnow closed 2 years ago
Hi, I'm interested in this issue . Could you please provide more detailed information ,or are there any references to help me get started?
Hi @is-shidian, vineyard currently can only share data on host memory, while many training/inference/sampling engines leverage GPU to accelerate tensor-centric task, so data sharing also has a high probability of occurring between different tasks on GPU. Therefore, the purpose of this topic is to implement data sharing mechanism on GPU, which requires the implementation of:
reference:
Describe your problem
Vineyard provides in-memory object sharing between diverse compute engines, allowing them to obtain shared objects in a zero-copy fashion. By providing a convenient API (i.e.,
get
/put
) and an extensible mechanism (i.e.,builder
/resolver
), vineyard enables efficient data sharing among polyglot compute engines and avoids the cost of replication and data serialization/deserialization.Currently, such efficient data sharing is only allowed to happen in host memory, while many computational engines are now introducing GPUs to accelerate computation(e.g. DL and CV). Thus we have the same opportunity to perform efficient data exchange in GPU memory without downloading data to host memory. The goal of this task is to extend the current mechanism of data sharing in host memory by vineyard to GPU memory so that:
- Different GPU processes can directly share blob in zero-copy fasion.
- vineyardd will have a long-running GPU daemon that is responsible for managing the blobs.
- Use the existing vineyard
meta
design to organize the blob into complex objects.Note that vineyardd on the GPU will be responsible for allocating data and then sharing memory using the CUDA IPC
MemHandle
, as vineyardd does on main memory. To do this, we need a newGPUBlobStore
, which will be responsible for managing the blob data on the GPU, and each blob will also have an additional representation bit field to distinguish whether it is a blob on the GPU or a blob in the host memory.Additional context
This issue is part of our Alibaba Summer of Code 2022 Program.
- Difficulty: Normal
- Mentor: Ke Meng (@septicmk)
By GPUBlobStore
, did you mean GPUBulkStore
?
@Y-jiji Yes
@septicmk No offense, but is there any scene where things in GPU memory can be kept immutable? (Please give me a punch if I misunderstood something) My previous knowledge is that the memory for GPU is rather expensive and is almost always changing. In typical training settings, the tensors kept on GPU are usually model parameters, and the data (which are the only immutable things) are usually kept in the host memory. Technics like preloading are commonly used to eliminate the IO bottleneck, but I haven't heard about loading data directly from another remote GPU, since they are somewhat elusive...
@Y-jiji For example, A graph task(learning like GNN, analytics like Label propagation) loads the topology of a real-world graph into GPU but only traverses the graph without changing it (Reloading these data into GPU via PCIe may be time-consuming). We have also considered allowing mutable objects in future updates.
@septicmk Thank you for your friendly punch.
Implemented in https://github.com/v6d-io/v6d/pull/876.
Thanks @CSWater for the hard working!
Describe your problem
Vineyard provides in-memory object sharing between diverse compute engines, allowing them to obtain shared objects in a zero-copy fashion. By providing a convenient API (i.e.,
get
/put
) and an extensible mechanism (i.e.,builder
/resolver
), vineyard enables efficient data sharing among polyglot compute engines and avoids the cost of replication and data serialization/deserialization.Currently, such efficient data sharing is only allowed to happen in host memory, while many computational engines are now introducing GPUs to accelerate computation(e.g. DL and CV). Thus we have the same opportunity to perform efficient data exchange in GPU memory without downloading data to host memory. The goal of this task is to extend the current mechanism of data sharing in host memory by vineyard to GPU memory so that:
meta
design to organize the blob into complex objects.Note that vineyardd on the GPU will be responsible for allocating data and then sharing memory using the CUDA IPC
MemHandle
, as vineyardd does on main memory. To do this, we need a newGPUBlobStore
, which will be responsible for managing the blob data on the GPU, and each blob will also have an additional representation bit field to distinguish whether it is a blob on the GPU or a blob in the host memory.Additional context
This issue is part of our Alibaba Summer of Code 2022 Program.