Open woctezuma opened 5 years ago
The example generated during the training use top_k=40
while the generate functions use top_k=0
by default (since fine-tuning reduces the inherent craziness). I supposed I can make top_k=40
a default in generation too to avoid the mode collapse issues @janelleshane was hitting too.
Based on this documentation in the code of GPT-2, top_k=0
means no restriction. So you suggest to constrain a bit more the text generation by setting top_k=40
to avoid my issue, right?
:top_k=0 : Integer value controlling diversity. 1 means only 1 word is
considered for each step (token), resulting in deterministic completions,
while 40 means 40 words are considered at each step. 0 (default) is a
special setting meaning no restrictions. 40 generally is a good value.
Is my understanding correct?
I have fine-tuned the
345M
model for 2000 iterations (fresh
). The output was fine.Then I have restarted my Colab session and fine-tuned it further up to 5000 iterations (
latest
).The samples shown during the fine-tuning process were fine:
However, some of the output of
gpt2.generate()
are nonsensical, e.g. the following excerpt:It also happens with:
Edit: I have restarted my Colab session and fine-tuned it further up to 6000 iterations (
latest
). This time, the issue appeared in some samples shown during the fine-tuning process. I guess: