ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.21k stars 1.19k forks source link

fix: use eos token in target tensor for instruction-tuning #3945

Closed geoffreyangus closed 9 months ago

geoffreyangus commented 9 months ago

Prior to this change, we used pad token at the end of target tensor. This was okay because many of the new LLMs trained with pad token == eos token. With Gemma, there is a separate eos token. The issue now is that, during generation, Gemma cannot produce an eos token, so generation never stops. We now use eos token during fine-tuning so that LLMs are guaranteed to learn how to stop during the generation step.

github-actions[bot] commented 9 months ago

Unit Test Results

  4 files  ±       0    4 suites  ±0   9m 29s :stopwatch: - 17m 37s 12 tests  - 2 972    9 :heavy_check_mark:  - 2 962    3 :zzz:  - 9  0 :x:  - 1  40 runs   - 2 960  28 :heavy_check_mark:  - 2 953  12 :zzz:  - 6  0 :x:  - 1 

Results for commit eaac1e41. ± Comparison against base commit d3470635.

:recycle: This comment has been updated with latest results.