turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

BOS and EOS #195

Closed SinanAkkoyun closed 1 year ago

SinanAkkoyun commented 1 year ago

This PR just adds the optional functionality to the tokenzier to pad EOS and BOS token IDs (important for some chat formats like llama2 and openassistant)

vadi2 commented 1 year ago

I'm curious, what does padding the token id's accomplish?

turboderp commented 1 year ago

Padding with padding tokens allows you to run multiple sequences of different length in the same batch. "Padding" in this PR really more prepending and appending the BOS and EOS tokens, respectively. Maybe the arguments should be called add_bos and add_eos to avoid this confusion?

Also, this is more of a workaround and I really would prefer to fully support encoding of control symbols, like in AutoTokenizer. I.e. you should just be able to add <s> and </s> at arbitrary places in the input string, as well as any other custom special tokens any given model requires. Ideally it would work when decoding as well.

SinanAkkoyun commented 1 year ago

Maybe the arguments should be called add_bos and add_eos to avoid this confusion?

Sure, will change!

Also, this is more of a workaround and I really would prefer to fully support encoding of control symbols, like in AutoTokenizer. I.e. you should just be able to add and at arbitrary places in the input string, as well as any other custom special tokens any given model requires. Ideally it would work when decoding as well.

I totally get that hence the question a couple of days prior to why sentencepiece, but llama2s repo is doing exactly this, appending eos and bos, tokenizing each 'role' prompt and concatenating all of those encoded IDs to a full prompt.

I wanted to implement exactly the same without messing with the tokenizer, so I believe that this change should also be merged in conjunction to tokenzier changes. (and doesn't decoding output the eos and bos strings?)

SinanAkkoyun commented 1 year ago

I will later also commit a PR of the exact llama 2 chat completion implementation as it itself is using sentencepiece and bos eos appending and I would like this tokenizer feature to be added for that, would that be ok with you?