Closed hidaris closed 1 year ago
Hi @hidaris. Thanks for bringing this up. I agree that the schema needs more thought at minimum to support argument descriptions.
I have a few related ideas about changes to the schema and action definition that I think we can discuss together under this topic. I'm still putting my thoughts together right now but I'll post shortly on a proposal for some changes we can make to address this.
Alright I needed some time because I think this brings up a few things that could be improved and I wanted to gather my thoughts. I like both ideas that you brought up and I'm leaning towards supporting both along with some other changes.
To start here's how I'm thinking we could define a new decorator:
@action(access_policy=ACCESS_PERMITTED)
def say(self, content: str):
...
This would replace the access_policy
decorator with a more general action
decorator and we can remove the _action__
prefix since it's redundant. This gives us more flexibility in naming methods and also creates a place to define general metadata about the action.
We can also use a default for the access_policy
since 9/10 times it's just ACCESS_PERMITTED
. So then the above becomes:
@action
def say(self, content: str):
...
Next I think we need to change the message format. So far we've reused the same simple format for both:
This was done for nothing more than simplicity, but I think that probably needs to change. Calling a function and describing a function are different. So I'm going to propose changes for both.
Here's a change I'm considering for invoking an action. This would replace the current schema.
{
"to": "John",
"from": "Jane",
"action": {
"__meta__": {
"thoughts": "I should greet John",
},
"say": {
"content": "Hello, John!"
}
}
}
I converted the action
to an object so that the action details are separated more clearly from the routing fields and changed the structure to try to be succinct.
The __meta__
parameter would be an optional field that one can use to attach metadata to any action. The thoughts
field would become an optional field this way. Not all agents need the thoughts
field, for example.
The __meta__
field would not be validated or provided to the underlying action method as arguments, but could be inspected using the _current_message
variable or within message entries in the _message_log
when needed.
There may be more reasons to record different metadata, perhaps to represent the "cost" of an action, or the time, etc. So one can experiment with custom fields this way.
With that (possibly) becoming the format for invoking an action now let's look at a format for describing an action. This is more complex.
For reference, the OpenAI function calling API uses the JSON Schema format, but note that OpenAI apparently converts the JSON Schema to a syntax that their models have been trained on, and they don't disclose that syntax.
I considered JSON Schema for a while but I don't think it's the best fit for this. Mostly because it's for describing data, not functions. Meaning, it doesn't represent typical function metadata like return values, raised errors, usage examples, etc. All of these could be useful to provide to a user or LLM.
So I think we could continue with our own format and use object validation with pydantic as you suggested. The following is how I think a help
message could return a description of the say
action. I added a few more fields for demonstration.
{
"name": "say",
"description": "Say something to this agent",
"__meta__": {
"thoughts": ["string", "Use this field to record your thoughts"],
},
"args": {
"content": ["string", "The content to say"],
},
"returns": ["string", "The response from the agent"],
"raises": {
"ValueError": "If the content is invalid"
},
"examples": [
{
"to": "John",
"action": {
"say": {
"content": "Hello, John!"
},
}
},
{
"to": "John",
"action": {
"__meta__": {
"thoughts": "I should greet John"
},
"say": {
"content": "Hello, John!"
}
}
}
]
}
I'm unsure of this exact format but it's a place to start. I'm using a convention of a two element array for describing fields with their types.
Finally we come back to looking at how we can construct a schema like this, and supply the argument descriptions when defining the action method.
I like the idea of supporting both docstrings and using a decorator. I think we could support both, with the decorator overriding the docstring. Here's an example of what I'm thinking:
@action(
help={
"name": "say",
"description": "Say something to this agent",
"args": {
"content": ["string", "The content to say"],
}
},
access_policy=ACCESS_PERMITTED,
)
def say(self, content: str) -> str:
"""
Say something to this agent.
Args:
content (str): The content to say.
Returns:
str: The response from the agent.
"""
...
The docstring is using the Google format which allows for argument descriptions and other metadata like return values etc. There are libraries that we could use to parse that.
The decorator overrides all the fields for demonstration. Both syntaxes would result in the same help
object.
The library would first parse the docstring and use the method signature to generate a help object, and then it would merge and override any values provided in the decorator. Note that the help
field uses the same format from above.
So this is what I'm thinking. I'm eager to hear your thoughts. What do you think of it overall? Any suggestions on the choices of format or other details?
I think this is likely what I'll work on next, along with some of the other API improvements.
Overall, I quite like the design idea you provided. 👍
Regarding the description format, I see that you've added examples, and I'd like to add something. I believe that self-descriptive information is crucial for interoperability in multi-agent systems, but how to balance completeness and brevity has been a troubling question for me in the past.
I've done some work on WoT before, and recently, I noticed that @wwj718 implemented support for micropython MQTT. In such a context, a problem I once encountered is that if the description format is too long, it increases the buffer size requirements for small edge devices, potentially preventing these devices from declaring themselves as agents.
Another question related to actions is whether some actions can be declared as private, rendering them invisible to other agents, but I'm unsure if this idea is considered an anti-pattern in agency.
I noticed that in the previous design, when 'to' is None, the route key is set to 'broadcast'. This behavior might be counterintuitive. If I'm observing the flow of intermediate messages, there's always an extra mental translation involved. It seems better if the default value for 'to' is 'broadcast'.
Thanks for the feedback @hidaris.
Regarding the broadcast key, I'll change it to something like __broadcast__
by default to be clearer.
Regarding the description format. I like the idea of being able to customize it entirely. It feels like something that needs experimentation and customization for different uses.
I'm not 100% sure of the details but I'm thinking that instead of sticking to one format, the help
parameter in the decorator could supply a custom format if needed. I'll think about how this could work.
Regarding private actions. Could you shed more light on what you'd like to accomplish?
I imagine that if an action is never visible, then it probably shouldn't be an action. But there may be cases where we want to selectively expose an action to some agents, or only under some conditions. I'm just not sure of a use case yet. I'm curious to hear what you're thinking.
I am currently building an autonomous agent on Agency. Ideally, as Marvin Minsky mentioned in 'The Society of Mind', I want to build many small agents to collectively create an illusion similar to human intelligence.
However, at this stage, many things are still not clear. Therefore, I temporarily place all prompt engineering and long-term memory based on vector databases in different utility classes instead of decomposing them all into agents.
Then, in a unified class called BrainAgent, I organize cycles like Plan-Execute through some actions. Therefore, there are some actions I do not want other agents to see, to avoid the cycle being triggered by mistake.
I am currently building an autonomous agent on Agency. Ideally, as Marvin Minsky mentioned in 'The Society of Mind', I want to build many small agents to collectively create an illusion similar to human intelligence.
This is amazing. I really hope to enable this kind of exploration so hearing about it is very exciting. I'd love to hear more when you get further along and please keep letting me know what features you need for it!
However, at this stage, many things are still not clear. Therefore, I temporarily place all prompt engineering and long-term memory based on vector databases in different utility classes instead of decomposing them all into agents.
Then, in a unified class called BrainAgent, I organize cycles like Plan-Execute through some actions. Therefore, there are some actions I do not want other agents to see, to avoid the cycle being triggered by mistake.
I see what you mean. It seems that we need some more control over visibility, instead of all actions/agents being visible to everyone.
One solution that might work for you today would be to override the _action__help
method to filter any actions you may want to hide from other agents. For example:
def _action__help(self, action_name: str = None) -> list:
allowed_agents = ["Agent1", "Agent2"]
if self._current_message["from"] in allowed_agents:
return super()._action__help(action_name)
else:
return []
_(Note that I use the self._current_message
variable which you can access within an action. I think I need to document this better.)_
I could also see this becoming a built in feature and an improvement on the access_policy
field. I think it could work on the basis of defining agent "groups" along with their access permissions. I haven't thought this through much yet but using the upcoming @action
decorator it could look something like:
@action(
access_policy={
"Group1": ACCESS_DENIED,
"Group2": ACCESS_ALLOWED,
"Group3": ACCESS_REQUESTED,
}
)
def my_action(...):
# ...
And then we would assign agent group id's when creating them:
agent = MyAgent("Smith", "Group1")
Any thoughts on this? Would either of these ideas work for you?
The above might need more thought actually. I'm not sure how groups would be maintained. It might require a new message field or a change to how id's work, which makes me question it a little...
Another possibility could be to base it on individual agent id's. So forget groups, you'd indicate access depending on requesting id. For example:
@action(
access_policy={
"Agent1": ACCESS_DENIED,
"Agent2": ACCESS_ALLOWED,
"Agent3": ACCESS_REQUESTED,
}
)
def my_action(...):
# ...
This would be easy to implement and it might be a start.
Another idea that came up would be to allow you to define a callback for you to implement your own visibility logic, something like:
def my_action_access_policy:
allowed_agents = ["Agent1", "Agent2"]
return self._current_message["from"] in allowed_agents
@action(access_policy=my_action_access_policy)
def my_action(...):
# ...
Let me know what you think of these ideas. If possible I'd like to include something along these lines in the next release.
I'm glad you're interested in my current exploration. I'm not sure if there needs to be a concept of a group here. I'm concerned that introducing group design just for visibility might affect your overall design decisions later on.
So, personally, I think if we can control visibility by rewriting _action__help
and the callback in the early stages, let's keep this part simple for now, Show this through documents or case studies so that other people with visibility requirements can learn from it.
Sounds good. I agree that we should resist complicating things. I'll make a note to document how to implement visibility through the help method for now.
I just released 1.3 (https://github.com/operand/agency/pull/97) which adds support for argument descriptions. The documentation for the feature is located here.
I'm going to close this but feel free to reopen if needed.
Wow, this might be the happiest thing for me today! 💯
Hello, I found that when using the function call agent in the demo, it may be unclear whether the parameter should be "objective" or "make a todo list" when faced with "your objective is xxx, your current task is make a todo list". In this case, it would be helpful to have a description of the parameters. I saw a comment mentioning that this functionality may be added in the future. I have done some exploration and would like to share my thoughts: I imagine that the description of the parameters should be optional, as non-function call agents may not be concerned about this. Here are two possible approaches: 1.Simply add parameter descriptions in action doc, then extract them using regular expressions. This method requires agreeing on the format of the parameters.
Implement a decorator args_schema that takes a subclass of pydantic BaseModel.
class SayInput(BaseModel): """Inputs for say"""
@args_schema(SayInput) @access_policy(ACCESS_PERMITTED) def _action__say(self, content: str): pass