fix: use eos token in target tensor for instruction-tuning

ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

http://ludwig.ai

Apache License 2.0

11.21k stars 1.19k forks source link

fix: use eos token in target tensor for instruction-tuning #3945

Closed geoffreyangus closed 9 months ago

geoffreyangus commented 9 months ago

Prior to this change, we used pad token at the end of target tensor. This was okay because many of the new LLMs trained with pad token == eos token. With Gemma, there is a separate eos token. The issue now is that, during generation, Gemma cannot produce an eos token, so generation never stops. We now use eos token during fine-tuning so that LLMs are guaranteed to learn how to stop during the generation step.

github-actions[bot] commented 9 months ago

Unit Test Results

  4 files ±      0   4 suites ±0 9m 29s :stopwatch: - 17m 37s 12 tests - 2 972   9 :heavy_check_mark: - 2 962   3 :zzz: - 9 0 :x: - 1 40 runs - 2 960 28 :heavy_check_mark: - 2 953 12 :zzz: - 6 0 :x: - 1

Results for commit eaac1e41. ± Comparison against base commit d3470635.

:recycle: This comment has been updated with latest results.