Longformer FP16 training broken since transformers 4.21

geniki commented 1 year ago

System Info

transformers 4.20 / transformers 4.21 Ubuntu 20, python 3.8

Who can help?

@ydshieh

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Apologies, I'm using my own dataset but the problem should be easy to reproduce with any Longformer + FP16 example. Upgrading from transformers 4.20 to 4.21 causes Longformer training loss to stay stuck around its initial value. When using transformers 4.20 + FP16 and transformers >= 4.21 + FP32, training loss declines as expected.

https://github.com/huggingface/transformers/pull/17306 seems to be what caused this. You can see on that issue that it affected other models too, some of which have been fixed one by one. Longformer is still affected as of transformers 4.26.

Expected behavior

Be able to train Longformer using fp16 precision on recent version of transformers.

ydshieh commented 1 year ago

Hi @geniki Thank you for reporting the issue.

but the problem should be easy to reproduce with any Longformer + FP16 example

It would be really nice if you can provide an example script that could reproduce the issue you reported, especially you mentioned should be easy to reproduce 🙏 Looking forward for it!

some of which have been fixed one by one

Could you remind me which PRs or commits fixed this issue 🙏 That will help a lot, thank you.

geniki commented 1 year ago

Thanks for your response @ydshieh. Here are some example where this issue has been addressed for other models: https://github.com/huggingface/transformers/pull/20605 https://github.com/huggingface/transformers/pull/18057 https://github.com/huggingface/transformers/pull/19229 https://github.com/huggingface/transformers/pull/17437

I'll try to make an online example with Longformer work somehow. Do you have any model training tests with small dummy data?

ydshieh commented 1 year ago

Hi @geniki You can take any dataset on HF Hub (that are for specific task you are working on), and select a subset of it (say the first 1024 examples).

However, as you already know some fixes (in you above comment), would you like to try to experiment a fix for this model (with your own dataset, potentially a subset) and open a PR ❤️ ? If not, no worry, but in this case, as I mentioned, a script that could reproduce would be really nice 👍

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers