turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.18k stars 233 forks source link

return last state in dynamic generator #501

Open nickpotafiy opened 2 weeks ago

nickpotafiy commented 2 weeks ago

This PR adds the ability to grab the last hidden state from a job.

job = ExLlamaV2DynamicJob(
    input_ids=input_ids,
    max_new_tokens=0,
    gen_settings=gen_settings,
    return_last_state=True,
)

This is a minimalist implementation I'm using for my own project. I haven't ran any extensive tests, but would be nice to see a feature like this added for things like generating embeddings for contextual search.