turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.66k stars 214 forks source link

remove tokens that exceed the max_seq_len #274

Open p11188536 opened 10 months ago

p11188536 commented 10 months ago

I want to remove tokens that exceed the max_seq_len. How can I achieve this functionality?

Qubitium commented 10 months ago
token_in  = tokenizer.encode(input)

// do your python array slice limit <= max_seq_len here
// ask bing/gpt how to return a sub-slice of a python slice/array