lm.generate and HuggingFace's generate give different results with do_sample=False

Hi, thanks for the great work.

I'm generating text with the lm function without sampling:

output = lm.generate(text, generate=200,max_length=1024,
        eos_token_id=1, pad_token_id=0, attribution=['grad_x_input','ig'])

Then using the original HuggingFace library using the same code as in ecco (by literally copying the function from lm.py):

outputs=model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            num_beams=1,
            # FIXME: +1 in max_length to account for first start token in decoder, find a better way to do this
            max_length=1024,
            do_sample=False,
            top_p=None,
            top_k=None,
            temperature=1,
            eos_token_id=1, pad_token_id=0,
            return_dict_in_generate=True,
            output_scores=True)

In both cases, we use the same seed (MKDIDTLISNNAL). But the first method produces this: WSKMLVEEDPGFFERLSQAQKPRALFITCSDSRLVPEQ. And the second produces this: WSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERL.

The two functions should give the same sequence of tokens since we are not sampling. There must be a bug in how lm.generates the tokens iteratively (we know the second sequence is the right one).

jalammar / ecco

lm.generate and HuggingFace's generate give different results with do_sample=False #92