pytorch / tensorpipe

A tensor-aware point-to-point communication primitive for machine learning
Other
247 stars 77 forks source link

Is there any plan to integrate DPDK? #419

Open eedalong opened 2 years ago

eedalong commented 2 years ago

tensorpipe may use dpdk to bypass kernel to avoid memory copy when use desnt have RDMA or EFA?

lw commented 2 years ago

Could you provide more information on DPDK?

eedalong commented 2 years ago

Tensor applications always need to transfer dense tensors which consumes pretty large memory. Traditional socket stack always involves memory buffer copy overhead which we want to avoid。

image

RDMA can be used to bypass kernel with the help of smart NICs image

But we may also use DPDK to bypass kernel copy which I think will helps to reduce buffer copy overhead

eedalong commented 2 years ago

@lw

lw commented 2 years ago

I know about RDMA but I still don't understand what DPDK is. What does it stand for? Can you point to some reference?

About RoCE, it should already work through our InfiniBand backends, let us know if you're encountering issues.