Tips for architecting Graphs of Operations?

marcgreen commented 1 year ago

Firstly, thanks a lot for your work and sharing this code!

I'd like to try applying this approach to my own cognitive tasks, but I'm having trouble grasping how one should design a Graph of Operations for a given task. Are there different categories of cognitive tasks that would each tend to call for a certain architecture of GoO?

Or am I maybe misunderstanding, and perhaps the implemented example (document merging) showcases what a typical GoO would be for a language-based cognitive task, and not much variation is necessarily needed or expected?

In the paper you say the following which are nice tips I think, but I don't understand how to apply them to building a GoO:

The overall goal when conducting graph decomposition is to break down a task to the point, where the LLM can solve it correctly for the majority of time using a single prompt (or with a few additional improvement steps). This signifi- cantly lowers the number of improvement/refinement steps needed during the later stages of the graph exploration. Fur- thermore, as indicated by our results, combining or concate- nating sub-results is usually an easier task than solving large task instances from scratch. Hence, the LLM is often suc- cessful when aggregating the final solution.

Any further tips or insight would be greatly appreciated!

tonyzhao6 commented 1 year ago

In general, the Graph of Operations (GoO) is a static graph that is manually constructed by the user. The operations and edges in the (GoO) depends on the logic of the task one is trying to model. Most of the time, there is not a "single" right way of constructing the GoO. However, there are some graphs that are better constructed than others. It would be up to the user to determine how to construct the "best" graph.

In the document merging example, the authors showcase two different ways to construct a graph to solve the task. At a certain level of abstraction, the graphs for the document merge example can be applied to any task involving "merging text-based entities" together. However, you would still have to alter the Prompter and Parser classes so that they are specific to your task.

nblach commented 1 year ago

To expand a little on @FruVirus great answer, to get the most out of the graph structure it helps to think about how the task can be broken down into smaller, easier pieces that the LLM can solve directly, which can then be joined to form the final solution. This is very similar to the traditional divide-and-conquer or dynamic programming style approaches. Once you identified the general structure of how the task can be solved, for instance, splitting and aggregating or just step-by-step etc., you would identify the needed operations to express the GoO (Note that you can always add additional operations as well). Another tip would be to investigate how good the LLM of choice is at performing different steps and tuning the number of responses+filters or utilizing improvement operations to refine responses. Lastly, if you are working on tasks where context-size is an issue, it often helps to aggregate at intermediate steps to keep the current state within the context.

I hope this helps, if you are still unsure on how to progress let me know!

spcl / graph-of-thoughts

Tips for architecting Graphs of Operations? #15