microsoft / genaiscript

Generative AI Scripting
https://microsoft.github.io/genaiscript/
MIT License
81 stars 22 forks source link

Synthetic Fine-tuning Data Extraction: Multi-Turn Conversation #522

Closed ngoiyaeric closed 3 weeks ago

ngoiyaeric commented 3 weeks ago

Hi, would this repository allow me to scan a pdf and create multi-turn conversations for the purpose of fine-tuning.

pelikhan commented 3 weeks ago

We can definitely scan pdfs currently, and also allow to declare tools/functions but we don't have a mechanism currently to add messages to the chat and call again. I assume you are looking for being able to generate more "user" message and reexecute the chat query?

ngoiyaeric commented 3 weeks ago

Yes, perhaps even in parallel to generate questions and verify their relevance to the contents of the content. The idea is to allow automated question and answer pair generation. https://github.com/e-p-armstrong/augmentoolkit made an attempt to solve this, it seems like a possible additional feature for this repo.

pelikhan commented 3 weeks ago

I am thinking about the ability of registering a function that gets called in the chat loop and gets the opportunity to add new messages to start a new turn. That function has a generation context that also has runPrompt so it can do sub LLM requests.

pelikhan commented 3 weeks ago

What about the defChatParticipant function which lets you register a callback for each chat "turn" that crafts a new message in the message list.

See https://github.com/microsoft/genaiscript/pull/526/files#diff-19017abfa179f310469b762f561a9bbbd3ad01297eeb51bec70f5244461cba6c

pelikhan commented 3 weeks ago

in 1.39, reopen if you need more.