`TestMarian_MT_EN::test_batch_generation_mt_en` Failing due to randomly generated tokens

LysandreJik commented 3 years ago

The test fails with the following:

_________________ TestMarian_MT_EN.test_batch_generation_mt_en _________________
[gw0] linux -- Python 3.6.9 /usr/local/bin/python

self = <tests.test_modeling_tf_marian.TestMarian_MT_EN testMethod=test_batch_generation_mt_en>

    @slow
    def test_batch_generation_mt_en(self):
>       self._assert_generated_batch_equal_expected()

tests/test_modeling_tf_marian.py:390: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_modeling_tf_marian.py:366: in _assert_generated_batch_equal_expected
    self.assertListEqual(self.expected_text, generated_words)
E   AssertionError: Lists differ: ['Tou[19 chars] healed a man who was affected by the sad disease of leprosy.'] != ['Tou[19 chars] healed a man who was affected by▁kifkażUnjonik ill.']
E   
E   First differing element 0:
E   'Touc[17 chars]s healed a man who was affected by the sad disease of leprosy.'
E   'Touc[17 chars]s healed a man who was affected by▁kifkażUnjonik ill.'
E   
E   - ['Touching gently, Jesus healed a man who was affected by the sad disease of '
E   ?                                                          ^^^^^^ ^^^ ^^^^^^^^^
E   
E   + ['Touching gently, Jesus healed a man who was affected by▁kifkażUnjonik ill.']
E   ?                                                          ^^^^^ ^^^^^^ ^^^^^^ +
E   
E   -  'leprosy.']

LysandreJik commented 3 years ago

Traced back to this commit: https://github.com/huggingface/transformers/commit/184ef8ecd05ac783827b196e8d15403820efedf9

I suspect there is a difference between the upload TF and PT checkpoints

LysandreJik commented 3 years ago

It seems there's a single difference in the final logits bias:

import torch
from transformers import MarianMTModel

pt_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-mt-en")
tf_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-mt-en", from_tf=True)

pt, tf = pt_model.state_dict(), tf_model.state_dict()

ptf = {}

for key, value in pt.items():
    ptf[key] = [value]

for key, value in tf.items():
    if key not in ptf:
        print(key, "not in ptf")
    else:
        ptf[key].append(value)

for key, value in ptf.items():
    _pt, _tf = value
    difference = torch.max(torch.abs(_pt - _tf)).tolist()
    if difference > 0:
        print(key, difference)

# final_logits_bias 10.176068305969238

Seems systematic, independent of runtime or seed.

LysandreJik commented 3 years ago

I would say the error comes from the TF checkpoint on the hub, looking forward to your input @patrickvonplaten and @patil-suraj.

I'll deactivate the test in the meantime.

LysandreJik commented 3 years ago

This is also the case for the Helsinki-NLP/opus-mt-en-zh checkpoint:

# final_logits_bias 8.724637031555176

LysandreJik commented 3 years ago

And for the Helsinki-NLP/opus-mt-en-ROMANCE checkpoint:

final_logits_bias 11.757145881652832

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers

`TestMarian_MT_EN::test_batch_generation_mt_en` Failing due to randomly generated tokens #12647