[Feature Request] MoE enhancements

The nature of this issue

During #470 review there was a list of thing that were not crucial for the PR but ideally they should be done. Find problems description bellow. This issue is quite general and contains all problems found during PR.

Use only `PeerID`

The idea of p2p is that PeerID is enough to communicate with another daemon, and Multiaddr is needed only to start a new node. Thus we should decrease usage of Multiaddr where it is possible.

Move cpu-bound things inside separate executor

There are some places in code (for example forward/backward for moe.client.expert) where cpu-bound things, such as serialization/deserialization take place inside async task. In order to increase efficiency they are better to be moved inside thread executor

Check inputs on server side

Currently hivemind.Server does not check that inputs are correct. If user sends malformed inputs, it may OOM the server. We should check for that in some future PR. See https://github.com/learning-at-home/hivemind/issues/3

Sending empty input causes exception

If clients sends tensor of shape [0, ...] (empty tensor), then it will be split into zero messages and uid will not be passed. Server will receive uid=None and fail with cryptic KeyError(None). We should either forbid this on client side or ensure that zero-element tensors are serialized into a stream with first empty message.

MoE operates only with lists of tensors

The code expects inputs/ouputs to be Iterable[torch.Tensor], however it can have more complex structure, such as dict with meta information.

Test load balancing for unary handlers on python side

Load balancing is tested inside libp2p-daemon itself and also we have some tests covering stream handlers. However there is zero tests on load balancing of unary handlers on hivemind side.

Remove gRPC-specific Python file compilation

Since gRPC-based communication is no longer present in hivemind, we can remove the corresponding compilation commands from setup.py

Add `--identity_path` to `run_server.py`

Similarly to examples/albert, it would be great to have an option to fix the libp2p address of the server.

TODO List:

[x] Use only PeerID where it possible
[ ] Move cpu-bound things inside separate executor
[ ] Check inputs on server side
[ ] Sending empty input causes exception
[ ] MoE operates only with lists of tensors
[ ] Test load balancing for unary handlers on python side
[x] Remove gRPC-specific Python file compilation
[x] Add --identity_path to run_server.py
[ ] make PeerID and ExpertData msgpack-serializable

learning-at-home / hivemind