huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
190 stars 58 forks source link

What does run_qa.py use tensor or data parallelism? #334

Closed Anand1405 closed 9 months ago

Anand1405 commented 9 months ago

I am using optimum neuron run_qa.py to fine-tune GPT2, by looking at the output it seems like it does data parallelism. Kindly confirm what kind of parallelism is done? If I enter 8 as batch size it shows total batch size as 16 as I am using it in trn1.2xlarge, so does this mean it is using data parallelism? If it is data parallelism then how to shift to tensor parallelism? Also can we use it for trn1.32xlarge with the utilisation of all cores?

michaelbenayoun commented 9 months ago

Hi @Anand1405 ,

Yes, by default it does data parallelism. If you use 32 workers on a trn1.32xlarge instance, you will be able to use all the cores as well.

Tensor parallelism is supported, but not for GPT-2. We support it for GPTNeo, GPTNeoX and LLama (v1 and v2).

Anand1405 commented 9 months ago

How do I use tensor parallelism for the same task if I switch to gpt-neo, as by default it does data parallelism?

michaelbenayoun commented 9 months ago

I am writing the doc here: #339 .

Anand1405 commented 9 months ago

When I use BertTokenizer the script runs fine, but with no tokenizer name passed it uses GPT2Tokenizer as default and the produces error. Is there any way that I can use GPT2Tokenizer for the task with the same script?

michaelbenayoun commented 9 months ago

Can you share the command line please?