The tool argument is always string instead of a list

brunoboto96 commented 2 weeks ago

It's happening whenever this tool is called. The task generates a list of keywords which then pass as an argument to the function as such:

class KeyWordAnalyser(BaseTool):
    name: str = "Keyword Search Volume Analyser"
    description: str = """Get search volume data and search insights for keywords."""

    # def _run(self, keywords: dict[str, list[str]]) -> str:
    def _run(self, keywords: list[str]) -> str:
        """Useful to search the internet about a given topic and return relevant results. """
        print(f"keywords: {keywords}. Type: {type(keywords)}")
        return generate_historical_metrics(keywords)

keyword_analyser_tool = KeyWordAnalyser()

keyword_analyser_tool.run(['word1', 'word2', ...]) # this works.

I have tried multiple combinations of LLMS and chat templates as well as using a dict with keywords as a field and then a list as a value: def _run(self, keywords: dict[str, list[str]]) -> str:

Agent/Task

# hermes2/llama3-8b (tried others)
    chat_ml_system = """<|im_start|>system
    {{ .System }}
    If the thought is to use a tool, do the following: for each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags.<|im_end|>"""
    chat_ml_prompt = """<|im_start|>
    {{ .Prompt }}<|im_end|>"""
    chat_ml_response = """<|im_start|>assistant
    {{ .Response }}<|im_end|>"""
...
    keywords_analyser = Agent(
        verbose=True,
        llm=llm,
        backstory="Skilled in extracting search volume data for keywords.",
        role="Search Volume Researcher",
        goal="Extract search volume data for the provided keywords.",
        tools=[keyword_analyser_tool],
        allow_delegation=False,
        max_iter=3,
        system_template=chat_ml_system,
        prompt_template=chat_ml_prompt,
        response_template=chat_ml_response,
    )

    analyse_keywords = Task(
        description=f"""Extract search volume data for the original Keywords and extracted keywords.""",
        expected_output=f"A list of Search volume data and additional info for the keywords from the previous step.",
        agent=keywords_analyser,
        context=[content_research],
    )

Output

Action Input: {"keywords": ['SMS marketing', 'Brands', 'Digital strategies', 'Immediacy', 'Visibility', 'Notifications', 'Consumers', 'Click-through rates (CTR)', 'Email marketing', 'Channel of communication', 'Marketing message', 'Customer relationships', 'Conversions', 'Thought leadership', 'Time-sensitive offers', 'New releases', 'Email marketing strategy', 'Abandoned cart reminders', 'Cost-effective', 'B2B']}
Observkeywords: ['SMS marketing'. Type: <class 'str'>
keywords: ['SMS marketing'. Type: <class 'str'>
keywords: ['SMS marketing'. Type: <class 'str'>
keywords: ['SMS marketing'. Type: <class 'str'>
keywords: ['SMS marketing'. Type: <class 'str'>
keywords: ['SMS marketing'. Type: <class 'str'>

I encountered an error while trying to use the tool. This was the error: 'str' object has no attribute 'keywords'.
 Tool Keyword Search Volume Analyser accepts these inputs: Keyword Search Volume Analyser(keywords: 'object') - Get search volume data and search insights for keywords.

Notice how ['word1' gets cut off and its a string instead of a list. I have switched between versions as well. Unless lists are not supported at all, or am I missing something?

danielgen commented 2 weeks ago

I am not sure if I follow the example but: the model passes only a string to tools in my understanding (i.e. the model does not pass datatypes other than text), while when you call it manually you pass a list. You could try modifying the function to eval() the string representation of the list to make it into an actual python list.

To make this practical: when you test it you should not test it as keyword_analyser_tool.run(['word1', 'word2', ...]) # this works. but as keyword_analyser_tool.run("['word1', 'word2', ...]") which is how the model can use it. That's why inside the tool function definition you should probably eval("['word1', 'word2', ...]") or similar

brunoboto96 commented 2 weeks ago

Good idea but it won't work: I encountered an error while trying to use the tool. This was the error: '[' was never closed (<string>, line 1).

So it's either 2 options:

Either I overlooked something or the chat template is wrong and needs a special token, although I also added it to the system prompt.

Or there is something in the source code, that's possibly trying to parse the action input but failing to join the words, thus sending it as string. As you can see below: This is the correct format, that the pydantic model of the tool for the field 'keywords' is requesting. Contains multiple keywords and it's also closed []:

Action Input: {"keywords": ['SMS marketing', 'Brands', 'Digital strategies', 'Immediacy', 'Visibility', 'Notifications', 'Consumers', 'Click-through rates (CTR)', 'Email marketing', 'Channel of communication', 'Marketing message', 'Customer relationships', 'Conversions', 'Thought leadership', 'Time-sensitive offers', 'New releases', 'Email marketing strategy', 'Abandoned cart reminders', 'Cost-effective', 'B2B']}

But then it tries to send the list as a string with the first argument, gets cut off and doesn't close ']' "['word'"

danielgen commented 2 weeks ago

@brunoboto96 it does work, take a look at this example from the tools library: https://github.com/joaomdmoura/crewAI-examples/blob/main/prep-for-a-meeting/tools/ExaSearchTool.py#L23

The newest issue you are facing seems to be related to the model you are using and the AgentParser code. I had a similar issue to you, whereby my local model had issues with the parser, see this: https://github.com/joaomdmoura/crewAI/issues/103#issuecomment-2102873906

(However openai gpts were working correctly with the unmodified parser code)

brunoboto96 commented 2 weeks ago

Again, appreciate the intent, but that would never work in this instance, since the rest of the list is not there, the input comes from the llm, if the list items are not there I can't make stuff out of thin air:

Instead, I've added some extra code in case it detects a list using [ whilst parsing the string at crewai > tools > tool_usage.py -> _validate_tool_input() :

    def _validate_tool_input(self, tool_input: str) -> str:
        try:
            ast.literal_eval(tool_input)
            return tool_input
        except Exception as e:
            print('Error:', e)
            # Clean and ensure the string is properly enclosed in braces
            tool_input = tool_input.strip()
            final_data = {}

            if '[' in tool_input:
                list_indexes_start = [i for i, ltr in enumerate(tool_input) if ltr == "["]
                list_indexes_end = [i for i, ltr in enumerate(tool_input) if ltr == "]"]
                print('list_indexes_start:', list_indexes_start)
                print('list_indexes_end:', list_indexes_end)

                list_values = []
                for i in range(len(list_indexes_start)):
                    list_value = tool_input[list_indexes_start[i]:list_indexes_end[i]+1].replace("'", '"')
                    list_values.append(list_value)
                    print('list_value:', list_value)
                print('list_values:', list_values)
                # return eval(list_values[0])

                # find keys for the lists in the tool_input
                keys = []
                for i in range(len(list_values)):
                    key = tool_input[:list_indexes_start[i]].strip().replace("'", '"').replace(":", "").replace("{", "").replace("}", "").replace('"', "")
                    keys.append(key)
                print('keys:', keys)
                for key in keys:
                    final_data[key] = json.loads(list_values[keys.index(key)])
                print('final_data:', final_data, type(final_data))

            print('tool input2:', tool_input, type(tool_input))
            if not tool_input.startswith("{"):
                tool_input = "{" + tool_input
            if not tool_input.endswith("}"):
                tool_input += "}"
            print('tool input3:', tool_input, type(tool_input))
            # Manually split the input into key-value pairs
            entries = tool_input.strip("{} ").split(",")
            print('entries:', entries)
            formatted_entries = []

            for entry in entries:
                print('entry:', entry)
                if ":" not in entry:
                    continue  # Skip malformed entries
                key, value = entry.split(":", 1)

                print('key:', key, 'value:', value)
                # Remove extraneous white spaces and quotes, replace single quotes
                key = key.strip().strip('"').replace("'", '"')

                print('key:', key, type(key), final_data.keys())
                if(key in final_data.keys()):
                    print('Key has been found:', key, final_data.keys())
                    continue
                else:
                    print('Key has not been found:', key)

                print('value:', value, type(value))
                value = value.strip().replace('"', "").replace("'", "")

                # Handle replacement of single quotes at the start and end of the value string
                if value.startswith("'") and value.endswith("'"):
                    value = value[1:-1]  # Remove single quotes
                    value = (
                        '"' + value.replace('"', '\\"') + '"'
                    )  # Re-encapsulate with double quotes
                    print('value1:', value)
                elif value.isdigit():  # Check if value is a digit, hence integer
                    formatted_value = value
                    print('value2:', formatted_value)
                elif value.lower() in [
                    "true",
                    "false",
                    "null",
                ]:  # Check for boolean and null values
                    formatted_value = value.lower()
                    print('value3:', formatted_value)
                else:
                    # Assume the value is a string and needs quotes
                    formatted_value = '"' + value.replace('"', '\\"') + '"'
                    print('value4:', formatted_value)

                # Rebuild the entry with proper quoting
                formatted_entry = f'"{key}": {formatted_value}'
                print('formatted_entry:', formatted_entry)
                formatted_entries.append(formatted_entry)

            print('formatted_entries:', formatted_entries)
            # Reconstruct the JSON string
            new_json_string = "{" + ", ".join(formatted_entries) + "}"

            print('final:',new_json_string, type(new_json_string))
            final_json_object = json.loads(new_json_string)
            print('final_json_object:', final_json_object, type(final_json_object))

            # Join final_data and final_json_object
            final_data.update(final_json_object)
            print('final_data:', final_data, type(final_data))
            return str(final_data)

I left the debug prints in there if someone wants to give it a go and refactor it properly you can easily test it as such:

tool_input = '{"keywords": ["word1", "word2", "word3"], "geo_target": "USA", "network": "GOOGLE_SEARCH", "language": "English"}'

tool_input = tool_input.strip()
...

I guess this would add support for lists as well. Not sure if it was already working for gpt-4. But even if its a bug with local/quantised LLMs feel free to refactor something similar and better and merge it if it helps 👍

danielgen commented 2 weeks ago

@brunoboto96 yeah I am just a user, not a repo maintainer.

What I am trying to help you with is saying:

I had the same exact behaviour as you with a local model and changing the parser logic fixed it. (you would need to try with ]} probably, not sure if this is a crewai bug or an issue with our local LLMs / something else).
Using OpenAI models through API the code worked fine as it is
Based on functions in related tools and examples repos, looks like for now there is no functionality in crewAI to eval() strings to python types and people should do that themselves (assuming the LLM manages to return the correctly closed string representation of a list, i.e. with ])

Good luck!

p.s. btw in your tool code you don't actually need lists, since asking the model to generate "word1,word2,word3" and then inside the tool using keywords.split(",") would work

joaomdmoura / crewAI

The tool argument is always string instead of a list #611