microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
418 stars 95 forks source link

API to modify generator input without recreating it #828

Closed WA225 closed 1 week ago

WA225 commented 3 weeks ago

Describe the bug I am wondering if there an API (C or python but preferably python) that allows us to modify the generator input without the need to recreate the generator.

I do not see any API that can do that in the documentation, but it would be really helpful to have it.

yufenglee commented 3 weeks ago

Could you please share more detains on this requirement? Do you want the generator object to serve for multiple inputs instead of one? if so, we are adding it.

WA225 commented 2 weeks ago

@yufenglee Yes, if possible, I would like to modify the input of the generator before the next prediction without having the recreate the generator for every input modification. Would that be possible with what is currently being developed?

elephantpanda commented 2 weeks ago

I'm guessing the idea is that if you having a chat with the LLM you want to add the conversation so far to the end of the input and add a new question.

natke commented 2 weeks ago

@WA225 and @elephantpanda, yes we are working on the continuous decoding feature, which will allow you to do this. The feature will be available in the next release