Future warnings for AdamW and encoder-decoder loss (v4.12.0).

echatzikyriakidis commented 1 year ago

Hi everyone!

When training models with model.fit() we get the following two future warnings:

transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning

transformers/models/encoder_decoder/modeling_encoder_decoder.py:634: FutureWarning: Version v4.12.0 introduces a better way to train encoder-decoder models by computing the loss inside the encoder-decoder framework rather than in the decoder itself. You may observe training discrepancies if fine-tuning a model trained with versions anterior to 4.12.0. The decoder_input_ids are now created based on the labels, no need to pass them yourself anymore.

Are these two warnings safe for now? Thanks.

echatzikyriakidis commented 1 year ago

Regarding the AdamW optimizer I have found this:

https://discuss.huggingface.co/t/huggingface-transformers-longformer-optimizer-warning-adamw/14711

avsolatorio commented 1 year ago

@echatzikyriakidis, thanks for this! I have made a PR to suppress this warning based on the link you shared. 😀

worldbank / REaLTabFormer

Future warnings for AdamW and encoder-decoder loss (v4.12.0). #37