GPT2 support bf16 for both training and inferecne

huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Apache License 2.0

152 stars 198 forks source link

GPT2 support bf16 for both training and inferecne #230

Closed ZhaiFeiyue closed 1 year ago

ZhaiFeiyue commented 1 year ago

Feature request

enable HMP for GPT2

Motivation

BF16 has better performance than FP32

Your contribution

submitting a PR

ZhaiFeiyue commented 1 year ago

@regisss I have tested the perf between FP32 and BF16, data is below on Gaudi2.

precision	train_samples_per_second	eval perplexity
FP32	47.624	21.0109
BF16	55.511	21.177

precision	train_samples_per_second	eval perplexity
FP32	306.631	21.7935
BF16	357.932	22.1765

is the above accuracy acceptable?

regisss commented 1 year ago

Yes it seems good to me!

For this, what you should actually do is to open a PR on the HF Hub here: click on "edit" and :

Set use_habana_mixed_precision": true
Add the list of bf16/fp32 ops such as it is done there for instance. Or are they the default ones?

ZhaiFeiyue commented 1 year ago

@regisss I have opened a PR, for your question, most of BF16 ops are same with the default, but mul should be FP32, since there is a accuracy bug(nan issue) from here

ZhaiFeiyue commented 1 year ago

closed since PR has been merged.

regisss commented 1 year ago

@regisss I have opened a PR, for your question, most of BF16 ops are same with the default, but mul should be FP32, since there is a accuracy bug(nan issue) from here

Would it make sense to disable HMP just for this part of the code and not have mul among fp32 ops?

ZhaiFeiyue commented 1 year ago

yes, I will submit a PR to disable HMP here, and should I remove mul from FP32 list from here or remove all the BF16 and FP32 ops and just change use_habana_mixed_precision": true ?

regisss commented 1 year ago

yes, I will submit a PR to disable HMP here, and should I remove mul from FP32 list from here or remove all the BF16 and FP32 ops and just change use_habana_mixed_precision": true ?

I think you can just remove mul in the Gaudi config of GPT2, that will make things clearer regarding which ops are computed in bf16.

ZhaiFeiyue commented 1 year ago

@regisss I will follow your comments after PR #232 merged.

regisss commented 1 year ago

@ZhaiFeiyue We can close this one right?

ZhaiFeiyue commented 1 year ago

@regisss yes