`comm.py` should maybe consider backend-specific support of different devices

Depending on the backend, distributed communication may only be supported on either CPU or GPU, see table here.

Right now, in comm.py communication is always done on the GPU, see here e.g.: https://github.com/zhijian-liu/torchpack/blob/d3fda521bc2e2684643a46103ecece816b53842b/torchpack/distributed/comm.py#L32-L34

I would suggest considering the backend-specific device support for both allgather() and broadcast() to ensure the functions are usable across multiple backends.

torch.distributed.broadcast_object_list and torch.distributed.all_gather_object might be a useful starting points for this.

zhijian-liu / torchpack

`comm.py` should maybe consider backend-specific support of different devices #30