Closed marcgreen closed 1 year ago
In general, the Graph of Operations (GoO) is a static graph that is manually constructed by the user. The operations and edges in the (GoO) depends on the logic of the task one is trying to model. Most of the time, there is not a "single" right way of constructing the GoO. However, there are some graphs that are better constructed than others. It would be up to the user to determine how to construct the "best" graph.
In the document merging example, the authors showcase two different ways to construct a graph to solve the task. At a certain level of abstraction, the graphs for the document merge example can be applied to any task involving "merging text-based entities" together. However, you would still have to alter the Prompter
and Parser
classes so that they are specific to your task.
To expand a little on @FruVirus great answer, to get the most out of the graph structure it helps to think about how the task can be broken down into smaller, easier pieces that the LLM can solve directly, which can then be joined to form the final solution. This is very similar to the traditional divide-and-conquer or dynamic programming style approaches. Once you identified the general structure of how the task can be solved, for instance, splitting and aggregating or just step-by-step etc., you would identify the needed operations to express the GoO (Note that you can always add additional operations as well). Another tip would be to investigate how good the LLM of choice is at performing different steps and tuning the number of responses+filters or utilizing improvement operations to refine responses. Lastly, if you are working on tasks where context-size is an issue, it often helps to aggregate at intermediate steps to keep the current state within the context.
I hope this helps, if you are still unsure on how to progress let me know!
Firstly, thanks a lot for your work and sharing this code!
I'd like to try applying this approach to my own cognitive tasks, but I'm having trouble grasping how one should design a Graph of Operations for a given task. Are there different categories of cognitive tasks that would each tend to call for a certain architecture of GoO?
Or am I maybe misunderstanding, and perhaps the implemented example (document merging) showcases what a typical GoO would be for a language-based cognitive task, and not much variation is necessarily needed or expected?
In the paper you say the following which are nice tips I think, but I don't understand how to apply them to building a GoO:
Any further tips or insight would be greatly appreciated!