huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
152 stars 198 forks source link

GPT2 support bf16 for both training and inferecne #230

Closed ZhaiFeiyue closed 1 year ago

ZhaiFeiyue commented 1 year ago

Feature request

enable HMP for GPT2

Motivation

BF16 has better performance than FP32

Your contribution

submitting a PR

ZhaiFeiyue commented 1 year ago

@regisss I have tested the perf between FP32 and BF16, data is below on Gaudi2.

precision train_samples_per_second eval perplexity
FP32 47.624 21.0109
BF16 55.511 21.177
precision train_samples_per_second eval perplexity
FP32 306.631 21.7935
BF16 357.932 22.1765

is the above accuracy acceptable?

regisss commented 1 year ago

Yes it seems good to me!

For this, what you should actually do is to open a PR on the HF Hub here: click on "edit" and :

ZhaiFeiyue commented 1 year ago

@regisss I have opened a PR, for your question, most of BF16 ops are same with the default, but mul should be FP32, since there is a accuracy bug(nan issue) from here

ZhaiFeiyue commented 1 year ago

closed since PR has been merged.

regisss commented 1 year ago

@regisss I have opened a PR, for your question, most of BF16 ops are same with the default, but mul should be FP32, since there is a accuracy bug(nan issue) from here

Would it make sense to disable HMP just for this part of the code and not have mul among fp32 ops?

ZhaiFeiyue commented 1 year ago

yes, I will submit a PR to disable HMP here, and should I remove mul from FP32 list from here or remove all the BF16 and FP32 ops and just change use_habana_mixed_precision": true ?

regisss commented 1 year ago

yes, I will submit a PR to disable HMP here, and should I remove mul from FP32 list from here or remove all the BF16 and FP32 ops and just change use_habana_mixed_precision": true ?

I think you can just remove mul in the Gaudi config of GPT2, that will make things clearer regarding which ops are computed in bf16.

ZhaiFeiyue commented 1 year ago

@regisss I will follow your comments after PR #232 merged.

regisss commented 1 year ago

@ZhaiFeiyue We can close this one right?

ZhaiFeiyue commented 1 year ago

@regisss yes