Closed WA225 closed 1 week ago
Could you please share more detains on this requirement? Do you want the generator object to serve for multiple inputs instead of one? if so, we are adding it.
@yufenglee Yes, if possible, I would like to modify the input of the generator before the next prediction without having the recreate the generator for every input modification. Would that be possible with what is currently being developed?
I'm guessing the idea is that if you having a chat with the LLM you want to add the conversation so far to the end of the input and add a new question.
@WA225 and @elephantpanda, yes we are working on the continuous decoding feature, which will allow you to do this. The feature will be available in the next release
Describe the bug I am wondering if there an API (C or python but preferably python) that allows us to modify the generator input without the need to recreate the generator.
I do not see any API that can do that in the documentation, but it would be really helpful to have it.