tunib-ai / parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
https://tunib-ai.github.io/parallelformers
Apache License 2.0
776 stars 61 forks source link

A bug with `n_fused` #41

Open JiayiFeng opened 1 year ago

JiayiFeng commented 1 year ago

When a attn_qkv Layer is set with n_fused>1 and reversed=False, the shape of its sliced weight is incorrect.

Seems that the root cause is here:

https://github.com/tunib-ai/parallelformers/blob/436573b05b9d47dba2234d60e874ff7bb57b725b/parallelformers/parallel/slicing.py#L79-L95

For a attn_qkv weight, the arg dim is 0. So when the reversed=False and n_fused>1, the tensor is chunked on the dim 0 and then concatenated on the dim 1. Which make its shape incorrect.

hyunwoongko commented 1 year ago

which model did you use?

JiayiFeng commented 1 year ago

I used a modified GPT-NeoX model, which is not officially supported. So I written a custom policy, and find this issue.

JiayiFeng commented 1 year ago

Maybe the

proj_layer = list( 
         map(lambda x: torch.cat([*x], dim=-1), zip(*proj_layer)) 
     ) 

should be:

proj_layer = list( 
         map(lambda x: torch.cat([*x], dim=dim), zip(*proj_layer)) 
     ) 

I guess.

hyunwoongko commented 1 year ago

okay so could you test it with other models?