tunib-ai / parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
https://tunib-ai.github.io/parallelformers
Apache License 2.0
776 stars 61 forks source link

Support for GPT-J #4

Closed andreamad8 closed 2 years ago

andreamad8 commented 3 years ago

Thanks for the great repo! I have tried it out, it's really amazing to lead such a large model in multiple GPUs.

Describe a requested feature

Currently, GPT-J is supported only in HF 4.7.0 and by installing

pip install git+https://github.com/finetuneanon/transformers@gpt-j

In your requirement, there is HF 4.8.0, and needs to load several new models. Soon gpt-j will be fully integrated in HF: https://github.com/huggingface/transformers/pull/12243

I am wondering if is there an easy way to have back compatibility, or include GPT-J soon.

Thanks again for your great repo 👍🏻

-- Andrea

hyunwoongko commented 3 years ago

(1) Thanks for the good issue. We will update backward compatibility patch soon :)

(2) However, there are some problems with the implementation of GPT-J, so we will add it when the official PR is merged, not the Draft version.

Thank you !

hyunwoongko commented 3 years ago

@andreamad8 I patched it to work in Transformers version 4.2.0 or higher. Would you like to update and test using pip install parallelformers --upgrade ?

andreamad8 commented 3 years ago

Great this feature works.

One thing I notice, it's the number of GPUs have to be an even number (2,4,8 .. ) to work. If I try to run 10 GPUs the code fails. Is this normal? If you want I can send you a more detailed error.

-- Andrea

hyunwoongko commented 3 years ago

This is a limitation of the Megatron LM algorithm that parallelformers are using. This is because parallelization is performed by dividing the parameters into N.

Tensors in most models have parameters of size multiples of 2. For example, if the nn.Linear layer in the model has shape [512, 512], splitting it in half will parallelize it as [[256, 512], [256, 512]].

So, the problem is occurred when using 10 GPUs. [512, 512] divided by 10 gives 51.2, so parallelization is not possible.

andreamad8 commented 3 years ago

Yeah, that's was I thought.

I suggest adding this info in the README, for the people (like me :)) that are not familiar with Megatron LM.

hyunwoongko commented 3 years ago

I think I forgot to inform users about this part. We will add this to the documentation soon.

Thank you very much for your good comments. :)

andreamad8 commented 3 years ago

I have noticed I cannot run multiple experiments because of

RuntimeError: Address already in use

Usually in torch.distributed.launch I can use --master_port, is there an equivalent in this framework?

-- Andrea

hyunwoongko commented 3 years ago

Check here https://github.com/tunib-ai/parallelformers/blob/main/FAQ.md#q-can-i-parallelize-multiple-models-on-same-gpus

andreamad8 commented 3 years ago

thanks, I should have read :)

andreamad8 commented 2 years ago

Hi,

they just added GPT-J in HF.

If I try running with I get this error:

AssertionError: GPTJForCausalLM is not supported yet.

Are you planning to support this model as well?

-- Andrea

hyunwoongko commented 2 years ago

We added GPTJ.