Is there a pure PyTorch implementation using torch.distributed.tensor.parallel instead of fairscale.nn.model_parallel ? Fairscale package looks a bit old with not much activity lately. Also, it will be good to have a list of other known implementations - pure PyTorch or not.
Is there a pure PyTorch implementation using torch.distributed.tensor.parallel instead of fairscale.nn.model_parallel ? Fairscale package looks a bit old with not much activity lately. Also, it will be good to have a list of other known implementations - pure PyTorch or not.