Open je1lee opened 6 months ago
@pengchongjin any idea for this?
Thanks for the change. Could you please paste a few example outputs before and after this change?
Also please make sure to test both run.py and run_xla.py. Thanks!
@pengchongjin test done with both scripts
BEFORE
model generates token regardless of eos token, so time spent in generation increases quadratically as output_len increases
AFTER
model stop generate when model samples out eos token time spent in generation remain still as output_len increases
With model.generate() it takes too long even sequence generation have done earlier with EOS token. Because now, it generates til it reached to output_len
fix the generate method to stop when every sequence has generated EOS token