Open soodoshll opened 1 year ago
Hi @soodoshll, thanks for the draft!
It looks good as a first verion rfc draft!
I have several suggestions:
distributed_config
and optimization_config
in the guide-level explanation. distributed_config
and optimization_config
, and what each config specify. It's okay only to put the ones that are known now and update the draft and add more configs during implementation and refactor in the future.mesh_axes_per_dim
in TensorShardSpec
. The design looks good to me. Hi @soodoshll and @xinli-git, could you also discuss how to seperate the whole feature into relative small steps to implement? We can use this issue to track the PRs related to this RFC, something like https://github.com/apache/tvm/issues/15319. Thanks!
Hi @yaoyaoding, thanks for your suggestions. I've fixed the draft.
The whole features can be decomposed into the following steps:
connect
function, which relies on (1)I'm working on 1 after it is done, we can start 2 and 3. I have a prototype of 3, which I will integrate later.
Hi @xinli-git, let's work in the auto-parallel branch.
I found that resharding (tensor conversion between ops with different specifications) sometimes requires the collective communication primitive all-to-all
. For example, it happens when a MxN matrix is sharded along axis M and we want to convert it to be sharded along axis N.
Though nccl does not directly supports all-to-all
, it can be implemented by send
and recv
. Without all-to-all
, a workaround is to use all-gather
and then do slicing for the same purpose, though suffering from suboptimal performance.
I'd suggest treat it as a low-prioritized TODO item and see if it will really cause performance issue. We can fix it after finishing the backbone of the whole pipeline.
Thanks! @soodoshll. The RFC is very detailed.
For modelling computation, it seems that Alpa assumes that all tensor contraction OPs (MM, Conv) must be fully sharded so all such ops that same computation cost under different sharding strategies. They also observe that other OPs have negligible runtime cost for computation. (I verified this as well). As a result, they think there was no need to model computation.
Since this feature probably requires a month of work for multiple people (currently me and Qidong) I was thinking maybe we can leverage github Projects (https://github.com/hidet-org/hidet/projects?query=is%3Aopen)
@yaoyaoding if you think that's a good idea I will take a lead on this
Hi @xinli-git, sounds good to me. I have not used the github project feature before, but you can have a try and let's see whether it helps the orgnization and planning.
rendered rfc