Unable to achieve the same accuracy between Trainer API and tensorflow models

KerenzaDoxolodeo commented 1 year ago

System Info

transformers version: 4.33.0
Platform: Linux-6.1.42+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.3
Accelerate version: 0.22.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.0 (True)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.2 (gpu)
Jax version: 0.4.13
JaxLib version: 0.4.13
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I run xlm-roberta in three implementations:

1) Using TFAutoModelForSequenceClassification 2) Using TFAutoModel, with the classification layer as faithful as possible to huggingface's implementation.

Code : https://www.kaggle.com/code/realdeo/keras-code/settings?scriptVersionId=145530298

3) Using TrainerAPI

Code : https://www.kaggle.com/code/realdeo/fork-of-notebookcb67cb4ef2/notebook?scriptVersionId=145540775

Expected behavior

I expect the code to have roughly the same accuracy. What happens here is the Trainer API successfully trained after 1 epoch while the tensorflow implementation stuck at predicting the same label.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

amyeroberts commented 11 months ago

Hi @KerenzaDoxolodeo, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers