During #470 review there was a list of thing that were not crucial for the PR but ideally they should be done. Find problems description bellow.
This issue is quite general and contains all problems found during PR.
Use only PeerID
The idea of p2p is that PeerID is enough to communicate with another daemon, and Multiaddr is needed only to start a new node. Thus we should decrease usage of Multiaddr where it is possible.
Move cpu-bound things inside separate executor
There are some places in code (for example forward/backward for moe.client.expert) where cpu-bound things, such as serialization/deserialization take place inside async task. In order to increase efficiency they are better to be moved inside thread executor
Check inputs on server side
Currently hivemind.Server does not check that inputs are correct. If user sends malformed inputs, it may OOM the server. We should check for that in some future PR. See https://github.com/learning-at-home/hivemind/issues/3
Sending empty input causes exception
If clients sends tensor of shape [0, ...] (empty tensor), then it will be split into zero messages and uid will not be passed. Server will receive uid=None and fail with cryptic KeyError(None). We should either forbid this on client side or ensure that zero-element tensors are serialized into a stream with first empty message.
MoE operates only with lists of tensors
The code expects inputs/ouputs to be Iterable[torch.Tensor], however it can have more complex structure, such as dict with meta information.
Test load balancing for unary handlers on python side
Load balancing is tested inside libp2p-daemon itself and also we have some tests covering stream handlers. However there is zero tests on load balancing of unary handlers on hivemind side.
Remove gRPC-specific Python file compilation
Since gRPC-based communication is no longer present in hivemind, we can remove the corresponding compilation commands from setup.py
Add --identity_path to run_server.py
Similarly to examples/albert, it would be great to have an option to fix the libp2p address of the server.
TODO List:
[x] Use only PeerID where it possible
[ ] Move cpu-bound things inside separate executor
[ ] Check inputs on server side
[ ] Sending empty input causes exception
[ ] MoE operates only with lists of tensors
[ ] Test load balancing for unary handlers on python side
[x] Remove gRPC-specific Python file compilation
[x] Add --identity_path to run_server.py
[ ] make PeerID and ExpertData msgpack-serializable
The nature of this issue
During #470 review there was a list of thing that were not crucial for the PR but ideally they should be done. Find problems description bellow. This issue is quite general and contains all problems found during PR.
Use only
PeerID
The idea of p2p is that
PeerID
is enough to communicate with another daemon, andMultiaddr
is needed only to start a new node. Thus we should decrease usage ofMultiaddr
where it is possible.Move cpu-bound things inside separate executor
There are some places in code (for example forward/backward for
moe.client.expert
) where cpu-bound things, such as serialization/deserialization take place inside async task. In order to increase efficiency they are better to be moved inside thread executorCheck inputs on server side
Currently
hivemind.Server
does not check that inputs are correct. If user sends malformed inputs, it may OOM the server. We should check for that in some future PR. See https://github.com/learning-at-home/hivemind/issues/3Sending empty input causes exception
If clients sends tensor of shape
[0, ...]
(empty tensor), then it will be split into zero messages and uid will not be passed.Server
will receiveuid=None
and fail with crypticKeyError(None)
. We should either forbid this on client side or ensure that zero-element tensors are serialized into a stream with first empty message.MoE operates only with lists of tensors
The code expects inputs/ouputs to be
Iterable[torch.Tensor]
, however it can have more complex structure, such asdict
with meta information.Test load balancing for unary handlers on python side
Load balancing is tested inside libp2p-daemon itself and also we have some tests covering stream handlers. However there is zero tests on load balancing of unary handlers on hivemind side.
Remove gRPC-specific Python file compilation
Since gRPC-based communication is no longer present in hivemind, we can remove the corresponding compilation commands from setup.py
Add
--identity_path
torun_server.py
Similarly to
examples/albert
, it would be great to have an option to fix the libp2p address of the server.TODO List:
PeerID
where it possible--identity_path
torun_server.py