Closed jordane95 closed 8 months ago
Now dp should be the third dimension of parallel ranks
Could you please elaborate on this?
One missing argument
Where is it 🤔
It is quite straightforward by looking at the definition of rank matrix https://github.com/huggingface/nanotron/blob/ff3c7746577948743da08c4868aca46cbc0c110b/src/nanotron/parallel/context.py#L69-L76
Missing is_expert_sharded
variable would cause referenced before defined error