Critic recognizes "we're not a team" (yet)

kfsone commented 10 months ago

When I ran the arxiv-research example to try an engineering-practices research ("research and try methods for building and packaging python modules as wheels") experiment, I got a noteworthy response from the critic up-front:

The plan has a solid foundation but seems to assume that a 'scientist' and 'engineer' are involved in the task, but this classification doesn't necessarily apply to a broad range of situations. I'd make considerations around the existent roles.

Let's take a moment to remember what the 'A' stands for, Artificial. The nature of all available training data to date is going to bias the sets of patterns on which these systems could possibly have been trained, especially bias it away from work-sharing collaboration.

If one looks at the prompts used by, say, ChatDev, we can see that users typically do not understand the implications and ramifications of this being text prediction and begin to herd cats.

Analogous to micromanaging in humans where the other party in a conversation floods the exchange with communication parameters and predicates, if we dilute the input with an excess of shaping elements we dilute attention and decrease the likelihood of a good match against the positive patterns we're looking for.

Write a python program to count files over 4k in size in a given directory tree.

vs

Write a program that counts files in a directory tree. Do not write more than 8 methods. Do not use the 'requests' package. All methods must be complete, there cannot be any empty pass methods or methods that just return None. All methods should be fully commented. It is likely that there are methods we could use to do this that would be helpful to other programs using this code as a library offering different strategies or controls, and while those would be very useful, try to avoid writing too many of them if you must. I am mostly a go programmer although I also know some erlang, but I would like this program in python. You don't need to explain in detail how this program is different than the go or erlang implementations.

(^- attention here is diluted so badly for GPT-4, bison, and codey that it is near guaranteed to do the opposite of what the prompt may appear to say to a human)

To this end, I want to suggest that the AutoGen team dedicate some aspect of AutoGen to team formation. For this it may be desirable to potentially run some local LLM to act like a HypervisoR agent, capable of rejecting responses or prompts, or at least pushing back on them.

Puntodrom commented 10 months ago

I just discovered AutoAgent, which seems to do exactly that: Here is the Paper: https://arxiv.org/pdf/2309.17288.pdf It could be helpful as an inspiration.

sonichi commented 10 months ago

@JieyuZ2 FYI

thinkall commented 2 months ago

We are closing this issue due to inactivity; please reopen if the problem persists.

JieyuZ2 commented 2 months ago

check out our new efforts in agent team formation: https://arxiv.org/abs/2405.19425

microsoft / autogen

Critic recognizes "we're not a team" (yet) #243