Optimize the speed of concurrent get of pytorch models

Describe your problem

Currently, getting a pytorch module at high concurrency is very slow as follows. The test machines's max bandwidth are both 30Gbps.

Vineyard

Concurrencies	Time of getting	Observed Network Bandwith from Dstat
1	2.57s	around 2000Mi
6	7.73s	around 3800Mi
13	14.58s	around 3800Mi
27	29.32s	around 3800Mi

Iperf

Concurrencies	Observed Network Bandwith from Dstat	Total Network bandwidth
1	around 1470Mi	12Gbits/s (1500Mib/s)
6	around 3700Mi	31.1Gbit/s (3888Mib/s)
13	around 3650Mi	30.9Gbit/s (3863Mib/s)
27	around 3650Mi	30.9Gbit/s (3863Mib/s)

Solution

In the actual scenery, the pytorch models used to be loaded in the machine with GPU, which always have high- performance networks. Thus, the bandwidth of vineyardd instance is the bottleneck. We can distribute the PyTorch model blobs among different Vineyard instances to increase network bandwidth.

v6d-io / v6d

Optimize the speed of concurrent get of pytorch models #1884

Describe your problem

Solution