predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
2.13k stars 140 forks source link

Use special tokens specific to the fine-tuned adapter during decoding #71

Open tgaddair opened 10 months ago

tgaddair commented 10 months ago

During fine-tuning, it's possible that special tokens are added that are specific to the adapter. During decoding, we should be using the special tokens, and ensure the correct stop tokens, padding, etc. are properly honored.

Repro from @runvnc, related: #68

Model ID: https://huggingface.co/qblocks/mistral_7b_norobots/tree/main

QLoRA repo example uses this AutoTokenizer with special tokens:

https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L347

llama-shepard commented 4 months ago

Will this be completed? Planning to use adapters with special tokens like the below ones: https://huggingface.co/Dogge/llama-3-8B-instruct-Bluemoon-Freedom-lora/ https://huggingface.co/Dogge/llama-3-70B-instruct-uncensored-lora