hao-ai-lab / LookaheadDecoding

Apache License 2.0
1.06k stars 62 forks source link

Does the module support fine-tuned Llama2? #12

Closed spring1915 closed 7 months ago

spring1915 commented 8 months ago

A fine-tuned Llama2 model may be stored locally. Can be it integrated with lade?

Can lade be used when the model is served in the streaming mode?

Viol2000 commented 8 months ago

Thanks for your interest.

Yes, any model based on LLaMA2 can be supported (if you did not change the model structure).

Regarding the streaming mode, I'm not entirely sure what you're referring to. Could you please provide a bit more context or elaborate on your question? This will help me understand your query better and provide a more accurate response.

knagrecha commented 8 months ago

@spring1915 by streaming do you mean an online inference endpoint receiving a stream of requests?

spring1915 commented 8 months ago

I meant the way of serving the model that can produce a stream of tokens for a request, like what we can see with OpenAI chat.

Viol2000 commented 8 months ago

If I understand correctly, it is supported. There are examples in chatbot.py.