microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
31.54k stars 4.59k forks source link

[Bug]: Tool Overuse in Assistant Agent #1824

Open gagb opened 7 months ago

gagb commented 7 months ago

Describe the bug

Whenever a tool is registered the agents are very enthusiastic at over using the tools and being wasteful. In the example below the chat history contains information necessary to complete the task without using the tools. But the assistant agent still proceeds to wastefully use the tools again to get the information. This is just a reproducible example but the bug is not one off -- it happens consistently across tool types and scenarios.

@afourney also has seen the same behavior multiple times.

Steps to reproduce

Use this script to reproduce ```python # filename: test/twoagent_tool_overuse.py from autogen import AssistantAgent, UserProxyAgent, config_list_from_json def search_arxiv(query: str, max_results: int=10) -> str: """ Searches arXiv for the given query using the arXiv API, then returns the search results. This is a helper function. In most cases, callers will want to use 'find_relevant_papers( query, max_results )' instead. Args: query (str): The search query. max_results (int, optional): The maximum number of search results to return. Defaults to 10. Returns: response (str): A JSON string containing the search results. Example: >>> results = search_arxiv("attention is all you need") >>> print(results) """ import json import re import arxiv # Normalize the query, removing operator keywords query = re.sub(r"[^\s\w]", " ", query.lower()) query = re.sub(r"\s(and|or|not)\s", " ", " " + query + " ") query = re.sub(r"[^\s\w]", " ", query.lower()) query = re.sub(r"\s+", " ", query).strip() search = arxiv.Search(query=query, max_results=max_results, sort_by=arxiv.SortCriterion.Relevance) jresults = list() for result in search.results(): r = dict() r["entry_id"] = result.entry_id r["updated"] = str(result.updated) r["published"] = str(result.published) r["title"] = result.title r["authors"] = [str(a) for a in result.authors] r["summary"] = result.summary r["comment"] = result.comment r["journal_ref"] = result.journal_ref r["doi"] = result.doi r["primary_category"] = result.primary_category r["categories"] = result.categories r["links"] = [str(link) for link in result.links] r["pdf_url"] = result.pdf_url jresults.append(r) if len(jresults) > max_results: jresults = jresults[0:max_results] response = json.dumps(jresults, indent=2) return response config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST", filter_dict={"model": ["gpt-4"]}) assistant = AssistantAgent("assistant", llm_config={"config_list": config_list}) user_proxy = UserProxyAgent( "user_proxy", code_execution_config={"work_dir": "coding", "use_docker": False}, human_input_mode="NEVER", is_termination_msg=lambda msg: msg.get("content") and "TERMINATE" in msg.get("content", "") ) assistant.register_for_llm(description="Find papers on arxiv.")(search_arxiv) user_proxy.register_for_execution()(search_arxiv) history = [ {'content': 'Find recent papers on xai by me', 'role': 'user'}, {'content': 'Certainly, here are the URLs of your recent papers on Copilot:\n\n1. "Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming"\n - [PDF Link](http://arxiv.org/pdf/2210.14306v4)\n\n2. "When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming"\n - [PDF Link](http://arxiv.org/pdf/2306.04930v2)\n\nThese papers are available on arXiv. If you require the papers to be saved, please use the naming convention specified in your preferences.', 'role': 'assistant'}, ] for msg in history: if msg["role"] == "user": user_proxy.send(msg["content"], assistant, request_reply=False) else: assistant.send(msg["content"], user_proxy, request_reply=False) user_proxy.initiate_chat(assistant, message="Save these pdfs to disk", clear_history=False) ```

Expected Behavior

Agents should write python and shell code to download the two pdf files in the chat history to the work dir.

Screenshots and logs

Instead the agents call tools again. This issue is not one off. It is frequent across tool types and applications.

Click here to view console log ``` @gagb ➜ /workspaces/autogen (main) $ python test/twoagent_tool_overuse.py user_proxy (to assistant): Find recent papers on xai by me -------------------------------------------------------------------------------- assistant (to user_proxy): Certainly, here are the URLs of your recent papers on Copilot: 1. "Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming" - [PDF Link](http://arxiv.org/pdf/2210.14306v4) 2. "When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming" - [PDF Link](http://arxiv.org/pdf/2306.04930v2) These papers are available on arXiv. If you require the papers to be saved, please use the naming convention specified in your preferences. -------------------------------------------------------------------------------- user_proxy (to assistant): Save these pdfs to disk -------------------------------------------------------------------------------- assistant (to user_proxy): ***** Suggested tool Call (call_ju0xcNhwZGhPpv4tpqXMmsCL): search_arxiv ***** Arguments: { "query": "xai", "max_results": 10 } ***************************************************************************** -------------------------------------------------------------------------------- >>>>>>>> EXECUTING FUNCTION search_arxiv... /workspaces/autogen/test/twoagent_tool_overuse.py:31: DeprecationWarning: The 'Search.results' method is deprecated, use 'Client.results' instead for result in search.results(): user_proxy (to assistant): user_proxy (to assistant): ***** Response from calling tool "call_ju0xcNhwZGhPpv4tpqXMmsCL" ***** [ { "entry_id": "http://arxiv.org/abs/2207.14160v2", "updated": "2022-10-04 10:45:23+00:00", "published": "2022-06-08 06:13:39+00:00", "title": "Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark", "authors": [ "Mohamed Karim Belaid", "Eyke H\u00fcllermeier", "Maximilian Rabus", "Ralf Krestel" ], "summary": "In recent years, Explainable AI (xAI) attracted a lot of attention as various\ncountries turned explanations into a legal right. xAI allows for improving\nmodels beyond the accuracy metric by, e.g., debugging the learned pattern and\ndemystifying the AI's behavior. The widespread use of xAI brought new\nchallenges. On the one hand, the number of published xAI algorithms underwent a\nboom, and it became difficult for practitioners to select the right tool. On\nthe other hand, some experiments did highlight how easy data scientists could\nmisuse xAI algorithms and misinterpret their results. To tackle the issue of\ncomparing and correctly using feature importance xAI algorithms, we propose\nCompare-xAI, a benchmark that unifies all exclusive functional testing methods\napplied to xAI algorithms. We propose a selection protocol to shortlist\nnon-redundant functional tests from the literature, i.e., each targeting a\nspecific end-user requirement in explaining a model. The benchmark encapsulates\nthe complexity of evaluating xAI methods into a hierarchical scoring of three\nlevels, namely, targeting three end-user groups: researchers, practitioners,\nand laymen in xAI. The most detailed level provides one score per test. The\nsecond level regroups tests into five categories (fidelity, fragility,\nstability, simplicity, and stress tests). The last level is the aggregated\ncomprehensibility score, which encapsulates the ease of correctly interpreting\nthe algorithm's output in one easy to compare value. Compare-xAI's interactive\nuser interface helps mitigate errors in interpreting xAI results by quickly\nlisting the recommended xAI solutions for each ML task and their current\nlimitations. The benchmark is made available at\nhttps://karim-53.github.io/cxai/", "comment": null, "journal_ref": null, "doi": null, "primary_category": "cs.SE", "categories": [ "cs.SE", "cs.AI" ], "links": [ "http://arxiv.org/abs/2207.14160v2", "http://arxiv.org/pdf/2207.14160v2" ], "pdf_url": "http://arxiv.org/pdf/2207.14160v2" }, { "entry_id": "http://arxiv.org/abs/2206.10847v3", "updated": "2022-09-20 18:35:32+00:00", "published": "2022-06-22 05:17:33+00:00", "title": "Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI", "authors": [ "Q. Vera Liao", "Yunfeng Zhang", "Ronny Luss", "Finale Doshi-Velez", "Amit Dhurandhar" ], "summary": "Recent years have seen a surge of interest in the field of explainable AI\n(XAI), with a plethora of algorithms proposed in the literature. However, a\nlack of consensus on how to evaluate XAI hinders the advancement of the field.\nWe highlight that XAI is not a monolithic set of technologies -- researchers\nand practitioners have begun to leverage XAI algorithms to build XAI systems\nthat serve different usage contexts, such as model debugging and\ndecision-support. Algorithmic research of XAI, however, often does not account\nfor these diverse downstream usage contexts, resulting in limited effectiveness\nor even unintended consequences for actual users, as well as difficulties for\npractitioners to make technical choices. We argue that one way to close the gap\nis to develop evaluation methods that account for different user requirements\nin these usage contexts. Towards this goal, we introduce a perspective of\ncontextualized XAI evaluation by considering the relative importance of XAI\nevaluation criteria for prototypical usage contexts of XAI. To explore the\ncontext dependency of XAI evaluation criteria, we conduct two survey studies,\none with XAI topical experts and another with crowd workers. Our results urge\nfor responsible AI research with usage-informed evaluation practices, and\nprovide a nuanced understanding of user requirements for XAI in different usage\ncontexts.", "comment": "Forthcoming for AAAI HCOMP 2022", "journal_ref": null, "doi": null, "primary_category": "cs.AI", "categories": [ "cs.AI", "cs.HC" ], "links": [ "http://arxiv.org/abs/2206.10847v3", "http://arxiv.org/pdf/2206.10847v3" ], "pdf_url": "http://arxiv.org/pdf/2206.10847v3" }, { "entry_id": "http://arxiv.org/abs/2111.06420v1", "updated": "2021-11-11 19:06:13+00:00", "published": "2021-11-11 19:06:13+00:00", "title": "Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and Future Opportunities", "authors": [ "Waddah Saeed", "Christian Omlin" ], "summary": "The past decade has seen significant progress in artificial intelligence\n(AI), which has resulted in algorithms being adopted for resolving a variety of\nproblems. However, this success has been met by increasing model complexity and\nemploying black-box AI models that lack transparency. In response to this need,\nExplainable AI (XAI) has been proposed to make AI more transparent and thus\nadvance the adoption of AI in critical domains. Although there are several\nreviews of XAI topics in the literature that identified challenges and\npotential research directions in XAI, these challenges and research directions\nare scattered. This study, hence, presents a systematic meta-survey for\nchallenges and future research directions in XAI organized in two themes: (1)\ngeneral challenges and research directions in XAI and (2) challenges and\nresearch directions in XAI based on machine learning life cycle's phases:\ndesign, development, and deployment. We believe that our meta-survey\ncontributes to XAI literature by providing a guide for future exploration in\nthe XAI area.", "comment": "29 pages, 2 figures, 4 tables", "journal_ref": null, "doi": null, "primary_category": "cs.LG", "categories": [ "cs.LG", "cs.AI" ], "links": [ "http://arxiv.org/abs/2111.06420v1", "http://arxiv.org/pdf/2111.06420v1" ], "pdf_url": "http://arxiv.org/pdf/2111.06420v1" }, { "entry_id": "http://arxiv.org/abs/2307.05104v1", "updated": "2023-07-11 08:26:08+00:00", "published": "2023-07-11 08:26:08+00:00", "title": "A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI", "authors": [ "Udo Schlegel", "Daniel A. Keim" ], "summary": "Explainable Artificial Intelligence (XAI) has gained significant attention\nrecently as the demand for transparency and interpretability of machine\nlearning models has increased. In particular, XAI for time series data has\nbecome increasingly important in finance, healthcare, and climate science.\nHowever, evaluating the quality of explanations, such as attributions provided\nby XAI techniques, remains challenging. This paper provides an in-depth\nanalysis of using perturbations to evaluate attributions extracted from time\nseries models. A perturbation analysis involves systematically modifying the\ninput data and evaluating the impact on the attributions generated by the XAI\nmethod. We apply this approach to several state-of-the-art XAI techniques and\nevaluate their performance on three time series classification datasets. Our\nresults demonstrate that the perturbation analysis approach can effectively\nevaluate the quality of attributions and provide insights into the strengths\nand limitations of XAI techniques. Such an approach can guide the selection of\nXAI methods for time series data, e.g., focusing on return time rather than\nprecision, and facilitate the development of more reliable and interpretable\nmachine learning models for time series analysis.", "comment": "16 pages, 2 pages references, 5 figures, 3 tables, submitted and\n accepted at xAI 2023", "journal_ref": null, "doi": null, "primary_category": "cs.LG", "categories": [ "cs.LG", "cs.AI" ], "links": [ "http://arxiv.org/abs/2307.05104v1", "http://arxiv.org/pdf/2307.05104v1" ], "pdf_url": "http://arxiv.org/pdf/2307.05104v1" }, { "entry_id": "http://arxiv.org/abs/2310.13007v5", "updated": "2024-02-07 18:07:13+00:00", "published": "2023-10-15 08:17:45+00:00", "title": "A Critical Survey on Fairness Benefits of XAI", "authors": [ "Luca Deck", "Jakob Schoeffer", "Maria De-Arteaga", "Niklas K\u00fchl" ], "summary": "In this critical survey, we analyze typical claims on the relationship\nbetween explainable AI (XAI) and fairness to disentangle the multidimensional\nrelationship between these two concepts. Based on a systematic literature\nreview and a subsequent qualitative content analysis, we identify seven\narchetypal claims from 175 papers on the alleged fairness benefits of XAI. We\npresent crucial caveats with respect to these claims and provide an entry point\nfor future discussions around the potentials and limitations of XAI for\nspecific fairness desiderata. Importantly, we notice that claims are often (i)\nvague and simplistic, (ii) lacking normative grounding, or (iii) poorly aligned\nwith the actual capabilities of XAI. We encourage to conceive XAI not as an\nethical panacea but as one of many tools to approach the multidimensional,\nsociotechnical challenge of algorithmic fairness. Moreover, when making a claim\nabout XAI and fairness, we emphasize the need to be more specific about what\nkind of XAI method is used and which fairness desideratum it refers to, how\nexactly it enables fairness, and who is the stakeholder that benefits from XAI.", "comment": null, "journal_ref": null, "doi": null, "primary_category": "cs.AI", "categories": [ "cs.AI" ], "links": [ "http://arxiv.org/abs/2310.13007v5", "http://arxiv.org/pdf/2310.13007v5" ], "pdf_url": "http://arxiv.org/pdf/2310.13007v5" }, { "entry_id": "http://arxiv.org/abs/2001.02478v3", "updated": "2021-09-03 20:10:35+00:00", "published": "2020-01-08 12:34:51+00:00", "title": "Questioning the AI: Informing Design Practices for Explainable AI User Experiences", "authors": [ "Q. Vera Liao", "Daniel Gruen", "Sarah Miller" ], "summary": "A surge of interest in explainable AI (XAI) has led to a vast collection of\nalgorithmic work on the topic. While many recognize the necessity to\nincorporate explainability features in AI systems, how to address real-world\nuser needs for understanding AI remains an open question. By interviewing 20 UX\nand design practitioners working on various AI products, we seek to identify\ngaps between the current XAI algorithmic work and practices to create\nexplainable AI products. To do so, we develop an algorithm-informed XAI\nquestion bank in which user needs for explainability are represented as\nprototypical questions users might ask about the AI, and use it as a study\nprobe. Our work contributes insights into the design space of XAI, informs\nefforts to support design practices in this space, and identifies opportunities\nfor future XAI work. We also provide an extended XAI question bank and discuss\nhow it can be used for creating user-centered XAI.", "comment": "PublishedACM CHI Conference on Human Factors in Computing Systems\n (CHI 2020). Updated XAI Question Bank in September 2021", "journal_ref": null, "doi": "10.1145/3313831.3376590", "primary_category": "cs.HC", "categories": [ "cs.HC", "cs.AI", "cs.LG", "cs.SE" ], "links": [ "http://dx.doi.org/10.1145/3313831.3376590", "http://arxiv.org/abs/2001.02478v3", "http://arxiv.org/pdf/2001.02478v3" ], "pdf_url": "http://arxiv.org/pdf/2001.02478v3" }, { "entry_id": "http://arxiv.org/abs/2101.03613v1", "updated": "2021-01-10 19:49:12+00:00", "published": "2021-01-10 19:49:12+00:00", "title": "Explainable Artificial Intelligence (XAI): An Engineering Perspective", "authors": [ "F. Hussain", "R. Hussain", "E. Hossain" ], "summary": "The remarkable advancements in Deep Learning (DL) algorithms have fueled\nenthusiasm for using Artificial Intelligence (AI) technologies in almost every\ndomain; however, the opaqueness of these algorithms put a question mark on\ntheir applications in safety-critical systems. In this regard, the\n`explainability' dimension is not only essential to both explain the inner\nworkings of black-box algorithms, but it also adds accountability and\ntransparency dimensions that are of prime importance for regulators, consumers,\nand service providers. eXplainable Artificial Intelligence (XAI) is the set of\ntechniques and methods to convert the so-called black-box AI algorithms to\nwhite-box algorithms, where the results achieved by these algorithms and the\nvariables, parameters, and steps taken by the algorithm to reach the obtained\nresults, are transparent and explainable. To complement the existing literature\non XAI, in this paper, we take an `engineering' approach to illustrate the\nconcepts of XAI. We discuss the stakeholders in XAI and describe the\nmathematical contours of XAI from engineering perspective. Then we take the\nautonomous car as a use-case and discuss the applications of XAI for its\ndifferent components such as object detection, perception, control, action\ndecision, and so on. This work is an exploratory study to identify new avenues\nof research in the field of XAI.", "comment": null, "journal_ref": null, "doi": null, "primary_category": "cs.LG", "categories": [ "cs.LG", "cs.AI" ], "links": [ "http://arxiv.org/abs/2101.03613v1", "http://arxiv.org/pdf/2101.03613v1" ], "pdf_url": "http://arxiv.org/pdf/2101.03613v1" }, { "entry_id": "http://arxiv.org/abs/2104.03483v3", "updated": "2021-09-03 20:28:38+00:00", "published": "2021-04-08 02:51:36+00:00", "title": "Question-Driven Design Process for Explainable AI User Experiences", "authors": [ "Q. Vera Liao", "Milena Pribi\u0107", "Jaesik Han", "Sarah Miller", "Daby Sow" ], "summary": "A pervasive design issue of AI systems is their explainability--how to\nprovide appropriate information to help users understand the AI. The technical\nfield of explainable AI (XAI) has produced a rich toolbox of techniques.\nDesigners are now tasked with the challenges of how to select the most suitable\nXAI techniques and translate them into UX solutions. Informed by our previous\nwork studying design challenges around XAI UX, this work proposes a design\nprocess to tackle these challenges. We review our and related prior work to\nidentify requirements that the process should fulfill, and accordingly, propose\na Question-Driven Design Process that grounds the user needs, choices of XAI\ntechniques, design, and evaluation of XAI UX all in the user questions. We\nprovide a mapping guide between prototypical user questions and exemplars of\nXAI techniques to reframe the technical space of XAI, also serving as boundary\nobjects to support collaboration between designers and AI engineers. We\ndemonstrate it with a use case of designing XAI for healthcare adverse events\nprediction, and discuss lessons learned for tackling design challenges of AI\nsystems.", "comment": "working paper", "journal_ref": null, "doi": null, "primary_category": "cs.HC", "categories": [ "cs.HC", "cs.AI" ], "links": [ "http://arxiv.org/abs/2104.03483v3", "http://arxiv.org/pdf/2104.03483v3" ], "pdf_url": "http://arxiv.org/pdf/2104.03483v3" }, { "entry_id": "http://arxiv.org/abs/2105.07804v1", "updated": "2021-05-17 13:18:57+00:00", "published": "2021-05-17 13:18:57+00:00", "title": "Designer-User Communication for XAI: An epistemological approach to discuss XAI design", "authors": [ "Juliana Jansen Ferreira", "Mateus Monteiro" ], "summary": "Artificial Intelligence is becoming part of any technology we use nowadays.\nIf the AI informs people's decisions, the explanation about AI's outcomes,\nresults, and behavior becomes a necessary capability. However, the discussion\nof XAI features with various stakeholders is not a trivial task. Most of the\navailable frameworks and methods for XAI focus on data scientists and ML\ndevelopers as users. Our research is about XAI for end-users of AI systems. We\nargue that we need to discuss XAI early in the AI-system design process and\nwith all stakeholders. In this work, we aimed at investigating how to\noperationalize the discussion about XAI scenarios and opportunities among\ndesigners and developers of AI and its end-users. We took the Signifying\nMessage as our conceptual tool to structure and discuss XAI scenarios. We\nexperiment with its use for the discussion of a healthcare AI-System.", "comment": "ACM CHI Workshop on Operationalizing Human-Centered Perspectives in\n Explainable AI at CHI 2021. 6 pages", "journal_ref": null, "doi": null, "primary_category": "cs.AI", "categories": [ "cs.AI", "cs.HC" ], "links": [ "http://arxiv.org/abs/2105.07804v1", "http://arxiv.org/pdf/2105.07804v1" ], "pdf_url": "http://arxiv.org/pdf/2105.07804v1" }, { "entry_id": "http://arxiv.org/abs/2311.01961v1", "updated": "2023-11-03 14:57:24+00:00", "published": "2023-11-03 14:57:24+00:00", "title": "Assessing Fidelity in XAI post-hoc techniques: A Comparative Study with Ground Truth Explanations Datasets", "authors": [ "M. Mir\u00f3-Nicolau", "A. Jaume-i-Cap\u00f3", "G. Moy\u00e0-Alcover" ], "summary": "The evaluation of the fidelity of eXplainable Artificial Intelligence (XAI)\nmethods to their underlying models is a challenging task, primarily due to the\nabsence of a ground truth for explanations. However, assessing fidelity is a\nnecessary step for ensuring a correct XAI methodology. In this study, we\nconduct a fair and objective comparison of the current state-of-the-art XAI\nmethods by introducing three novel image datasets with reliable ground truth\nfor explanations. The primary objective of this comparison is to identify\nmethods with low fidelity and eliminate them from further research, thereby\npromoting the development of more trustworthy and effective XAI techniques. Our\nresults demonstrate that XAI methods based on the backpropagation of output\ninformation to input yield higher accuracy and reliability compared to methods\nrelying on sensitivity analysis or Class Activation Maps (CAM). However, the\nbackpropagation method tends to generate more noisy saliency maps. These\nfindings have significant implications for the advancement of XAI methods,\nenabling the elimination of erroneous explanations and fostering the\ndevelopment of more robust and reliable XAI.", "comment": null, "journal_ref": null, "doi": null, "primary_category": "cs.CV", "categories": [ "cs.CV", "cs.AI" ], "links": [ "http://arxiv.org/abs/2311.01961v1", "http://arxiv.org/pdf/2311.01961v1" ], "pdf_url": "http://arxiv.org/pdf/2311.01961v1" } ] ********************************************************************** -------------------------------------------------------------------------------- assistant (to user_proxy): Although I found multiple recent papers on explainable AI (XAI) from Arxiv, I couldn't find any papers authored by you. The search results may not be exhaustive and it's possible that your papers might be included in different databases or not indexed under the search criteria I used. However, if you would like to download any of the papers I found, you may use the following Python code: ```python import requests paper_urls = [ "http://arxiv.org/pdf/2207.14160v2", "http://arxiv.org/pdf/2206.10847v3", # add more paper URLs here ] for url in paper_urls: response = requests.get(url) filename = url.split("/")[-1] with open(filename, 'wb') as f: f.write(response.content) `` ``` `` Replace `paper_urls` with the actual URLs of the PDFs you want to download. This script will download each paper and save it in your current working directory. You may not be allowed to execute this code due to security restrictions. If that's the case, please manually download papers by visiting each URL in your web browser. -------------------------------------------------------------------------------- >>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)... user_proxy (to assistant): exitcode: 0 (execution succeeded) Code output: -------------------------------------------------------------------------------- assistant (to user_proxy): Great, it seems execution of Python code is allowed. Here's the Python code for downloading recent papers related to explainable AI (XAI) from the search results. ```python import requests paper_urls = [ "http://arxiv.org/pdf/2207.14160v2", "http://arxiv.org/pdf/2206.10847v3", "http://arxiv.org/pdf/2111.06420v1", "http://arxiv.org/pdf/2307.05104v1", "http://arxiv.org/pdf/2310.13007v5", "http://arxiv.org/pdf/2001.02478v3", "http://arxiv.org/pdf/2101.03613v1", "http://arxiv.org/pdf/2104.03483v3", "http://arxiv.org/pdf/2105.07804v1", "http://arxiv.org/pdf/2311.01961v1" ] for url in paper_urls: response = requests.get(url) filename = url.split("/")[-1] with open(filename, 'wb') as f: f.write(response.content) `` ``` `` This code will download the first 10 papers from the search results related to explainable AI (XAI) to the current directory. When you run this code, you'll see PDF files appearing in your current directory, each corresponding to a paper. -------------------------------------------------------------------------------- >>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)... user_proxy (to assistant): exitcode: 0 (execution succeeded) Code output: -------------------------------------------------------------------------------- assistant (to user_proxy): Great! The code was executed successfully. This indicates that all the PDFs are downloaded and saved to your current directory. Each PDF corresponds to a research paper on Explainable AI (XAI) from the search results. You can open these files with any PDF reader. If you need any more assistance regarding this task or any other, feel free to ask. If all is done, we can conclude here. -------------------------------------------------------------------------------- user_proxy (to assistant): -------------------------------------------------------------------------------- assistant (to user_proxy): TERMINATE -------------------------------------------------------------------------------- ```
afourney commented 7 months ago

Thanks for producing a minimal example.

Indeed, this behavior is the prime reason that I tend to wrap tools in separate agents (e.g., web surfer) rather than add them as functions/tools to a generalist agent. The difference in performance is stark.

gagb commented 7 months ago

@jackgerrits @ekzhu

ekzhu commented 7 months ago

I think this could be part of documentation on tool use. @davorrunje have you encountered similar problems?

davorrunje commented 6 months ago

@gagb @ekzhu @afourney This behaviour is not so unexpected. Although the information was present in history, there is no guarantee that the information is recent. If this is not the desired behavior, it can probably be changed in the system prompt.

In many practical applications, I prefer function calls because they are much more constrained and easier to control. E.g. functions can check history to make sure the user explicitly approved the call with given parameters if a human-readable form. Code execution is more powerful but much more difficult to control.

gagb commented 6 months ago

Great point @davorrunje!

But that's part of the frustration -- even when I change the task to "Using the above links, save the pdfs to disk" -- this instruction is not unambiguous. They agents will proceed to use the tool again!

# in the original example replace with the below task
user_proxy.initiate_chat(assistant, message="Using the above links, save the pdfs to disk", clear_history=False)
davorrunje commented 6 months ago

I think this is the consequence of how this particular model was fine-tuned. Depending on how people use LLM-s in applications, LLM-s will be tuned to prefer one type of behavior or the other. They will also be better at following instructions which they are not particularly good at today. Right now it is a bit of a struggle, but it is improving very rapidly.

jtoy commented 4 months ago

Would adding something like the concept of a budget or cooldown help? So every time a function is called , it subtracts from an allocated number of calls the agent is allowed to make. If the budget is passed, then the agent is not allowed to call the function anymore.