microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
30.73k stars 4.48k forks source link

TypeError: unhashable type: 'list' #983

Closed pranavvr-lumiq closed 5 months ago

pranavvr-lumiq commented 9 months ago

Describe the issue

I am trying to use RetrieveUserProxyAgent, and I am getting the following error: TypeError: unhashable type: 'list'

My goal is to us multiple csv files in RAG and generate a report based on its contents.

Steps to reproduce

Basically copy paste the cells from the following notebookj: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat_RAG.ipynb

Change the "docs_path" to either a list of the csv files I am trying to use or the filepath the the folder containing the csv files.

run rag_chat()

I run into the problem.

Screenshots and logs

Screenshot from 2023-12-14 15-10-31 Screenshot from 2023-12-14 15-10-59 Screenshot from 2023-12-14 15-14-36

Additional Information

rag_chat() max_tokens is too small to fit a single line of text. Breaking this line: POL_ID,POL_BILL_MODE_CD,PLAN_ID,POL_MPREM_AMT,CVG_FACE_AMT,POLICY_TERM,PPT,PREMIUM_FREQUENCY ... Failed to split docs with must_break_at_empty_line being True, set to False. Trying to create collection. max_tokens is too small to fit a single line of text. Breaking this line: _id,workItemID ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","IIB_SCORE" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","BNFY1_REL_INSRD_CD","BNFY2_REL_INSRD_CD","BNFY3_REL_INSRD_CD" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","TRC_PROPOSAL" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","UW_DECISION" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: POL_ID,AUREOUS_RISK_SCORE1,AUREOUS_RISK_BAND1,AUREOUS_RISK_SCORE2,AUREOUS_RISK_BAND2,AUREOUS_RISK_SC ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","PROPOSER_RELATIONSHIP","PROPOSER_EARN_INCM_AMT" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","IIB_QUEST_IS_NEGATIVE" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","FCRR_RATING" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "CLI_ID","POL_ID","LA_EXST_CLI_IND","CLI_BTH_DT","AGE_PROOF_TYP_CD","CLI_SEX_CD","CLI_MARIT_STAT_CD" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: POL_ID,POL_BILL_MODE_CD,PLAN_ID,POL_MPREM_AMT,CVG_FACE_AMT,POLICY_TERM,PPT,PREMIUM_FREQUENCY ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: _id,workItemID ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","IIB_SCORE" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","BNFY1_REL_INSRD_CD","BNFY2_REL_INSRD_CD","BNFY3_REL_INSRD_CD" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","TRC_PROPOSAL" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","UW_DECISION" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: POL_ID,AUREOUS_RISK_SCORE1,AUREOUS_RISK_BAND1,AUREOUS_RISK_SCORE2,AUREOUS_RISK_BAND2,AUREOUS_RISK_SC ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","PROPOSER_RELATIONSHIP","PROPOSER_EARN_INCM_AMT" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","IIB_QUEST_IS_NEGATIVE" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "POL_ID","FCRR_RATING" ... Failed to split docs with must_break_at_empty_line being True, set to False. max_tokens is too small to fit a single line of text. Breaking this line: "CLI_ID","POL_ID","LA_EXST_CLI_IND","CLI_BTH_DT","AGE_PROOF_TYP_CD","CLI_SEX_CD","CLI_MARIT_STAT_CD" ... Failed to split docs with must_break_at_empty_line being True, set to False. doc_ids: [['doc_142', 'doc_787', 'doc_137']]

TypeError Traceback (most recent call last) Cell In[17], line 1 ----> 1 rag_chat()

Cell In[14], line 9, in rag_chat() 6 manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config) 8 # Start chatting with boss_aid as this is the user proxy agent. ----> 9 boss_aid.initiate_chat( 10 manager, 11 problem=PROBLEM, 12 n_results=3, 13 )

File ~/anaconda3/envs/pyautogen/lib/python3.10/site-packages/autogen/agentchat/conversable_agent.py:550, in ConversableAgent.initiate_chat(self, recipient, clear_history, silent, context) 536 """Initiate a chat with the recipient agent. 537 538 Reset the consecutive auto reply counter. (...) 547 "message" needs to be provided if the generate_init_message method is not overridden. 548 """ 549 self._prepare_chat(recipient, clear_history) --> 550 self.send(self.generate_init_message(context), recipient, silent=silent)

File ~/anaconda3/envs/pyautogen/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:420, in RetrieveUserProxyAgent.generate_init_message(self, problem, n_results, search_string) 418 self.problem = problem 419 self.n_results = n_results --> 420 doc_contents = self._get_context(self._results) 421 message = self._generate_message(doc_contents, self._task) 422 return message

File ~/anaconda3/envs/pyautogen/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:252, in RetrieveUserProxyAgent._get_context(self, results) 250 if results["ids"][0][idx] in self._doc_ids: 251 continue --> 252 _doc_tokens = self.custom_token_count_function(doc, self._model) 253 if _doc_tokens > self._context_max_tokens: 254 func_print = f"Skip doc_id {results['ids'][0][idx]} as it is too long to fit in the context."

File ~/anaconda3/envs/pyautogen/lib/python3.10/site-packages/autogen/token_count_utils.py:57, in count_token(input, model) 48 """Count number of tokens used by an OpenAI model. 49 Args: 50 input: (str, list, dict): Input to the model. (...) 54 int: Number of tokens from the input. 55 """ 56 if isinstance(input, str): ---> 57 return _num_token_from_text(input, model=model) 58 elif isinstance(input, list) or isinstance(input, dict): 59 return _num_token_from_messages(input, model=model)

File ~/anaconda3/envs/pyautogen/lib/python3.10/site-packages/autogen/token_count_utils.py:67, in _num_token_from_text(text, model) 65 """Return the number of tokens used by a string.""" 66 try: ---> 67 encoding = tiktoken.encoding_for_model(model) 68 except KeyError: 69 logger.warning(f"Model {model} not found. Using cl100k_base encoding.")

File ~/anaconda3/envs/pyautogen/lib/python3.10/site-packages/tiktoken/model.py:97, in encoding_for_model(model_name) 92 def encoding_for_model(model_name: str) -> Encoding: 93 """Returns the encoding used by a model. 94 95 Raises a KeyError if the model name is not recognised. 96 """ ---> 97 return get_encoding(encoding_name_for_model(model_name))

File ~/anaconda3/envs/pyautogen/lib/python3.10/site-packages/tiktoken/model.py:73, in encoding_name_for_model(model_name) 68 """Returns the name of the encoding used by a model. 69 70 Raises a KeyError if the model name is not recognised. 71 """ 72 encoding_name = None ---> 73 if model_name in MODEL_TO_ENCODING: 74 encoding_name = MODEL_TO_ENCODING[model_name] 75 else: 76 # Check if the model matches a known prefix 77 # Prefix matching avoids needing library updates for every model version release 78 # Note that this can match on non-existent models (e.g., gpt-3.5-turbo-FAKE)

TypeError: unhashable type: 'list'

pranavvr-lumiq commented 9 months ago

Also getting the following error: TypeError: stat: path should be string, bytes, os.PathLike or integer, not list

thinkall commented 9 months ago

I tried with two csv file, and it worked:

Trying to create collection.
max_tokens is too small to fit a single line of text. Breaking this line:
    instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,cnt ...
Failed to split docs with must_break_at_empty_line being True, set to False.
max_tokens is too small to fit a single line of text. Breaking this line:
    Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10 ...
Failed to split docs with must_break_at_empty_line being True, set to False.
doc_ids:  [['doc_324', 'doc_9', 'doc_34']]
Adding doc_id doc_324 to context.
Adding doc_id doc_9 to context.
Adding doc_id doc_34 to context.
Boss_Assistant (to chat_manager):

You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
For code generation, you must obey the following rules:
Rule 1. You MUST NOT install any packages because all the packages needed are already installed.
Rule 2. You must follow the formats below to write your code:
```language
# your code

User's question is: Plot a figure based on the given data.

Context is: 7128,10/29/2011,4,0,10,14,0,6,0,3,0.24,0.197,0.87,0.4478,29 7129,10/29/2011,4,0,10,15,0,6,0,3,0.22,0.2121,0.93,0.2537,41 7130,10/29/2011,4,0,10,16,0,6,0,3,0.22,0.197,0.93,0.3284,22 7131,10/29/2011,4,0,10,17,0,6,0,3,0.22,0.197,0.93,0.3284,31 7132,10/29/2011,4,0,10,18,0,6,0,3,0.22,0.197,0.93,0.3284,43 7133,10/29/2011,4,0,10,19,0,6,0,1,0.24,0.2121,0.87,0.3582,39 7134,10/29/2011,4,0,10,20,0,6,0,1,0.24,0.2121,0.87,0.3582,47 7135,10/29/2011,4,0,10,21,0,6,0,1,0.24,0.2121,0.87,0.3582,50 7136,10/29/2011,4,0,10,22,0,6,0,1,0.22,0.2121,0.87,0.2239,54 7137,10/29/2011,4,0,10,23,0,6,0,1,0.22,0.2273,0.87,0.194,36 7138,10/30/2011,4,0,10,0,0,0,0,1,0.22,0.2121,0.87,0.2239,54 7139,10/30/2011,4,0,10,1,0,0,0,1,0.22,0.2121,0.87,0.2537,43 7140,10/30/2011,4,0,10,2,0,0,0,1,0.22,0.2121,0.87,0.2836,50 7141,10/30/2011,4,0,10,3,0,0,0,1,0.24,0.2121,0.75,0.3582,33 7142,10/30/2011,4,0,10,4,0,0,0,1,0.22,0.197,0.8,0.3284,11 7143,10/30/2011,4,0,10,5,0,0,0,1,0.24,0.2121,0.75,0.2985,4 7144,10/30/2011,4,0,10,6,0,0,0,1,0.24,0.2273,0.75,0.2537,10 7145,10/30/2011,4,0,10,7,0,0,0,1,0.24,0.2879,0.75,0,22 7146,10/30/2011,4,0,10,8,0,0,0,1,0.26,0.2576,0.7,0.2239,80 7147,10/30/2011,4,0,10,9,0,0,0,1,0.3,0.2879,0.65,0.2537,147 7148,10/30/2011,4,0,10,10,0,0,0,1,0.32,0.3333,0.61,0.0896,178 7149,10/30/2011,4,0,10,11,0,0,0,1,0.36,0.3485,0.53,0.2239,240 198,1/9/2011,1,0,1,12,0,0,0,1,0.18,0.1364,0.37,0.4478,83 199,1/9/2011,1,0,1,13,0,0,0,1,0.2,0.1667,0.34,0.4478,75 200,1/9/2011,1,0,1,14,0,0,0,1,0.22,0.1818,0.32,0.4627,72 201,1/9/2011,1,0,1,15,0,0,0,1,0.22,0.197,0.35,0.3582,82 202,1/9/2011,1,0,1,16,0,0,0,1,0.2,0.1667,0.34,0.4478,92 203,1/9/2011,1,0,1,17,0,0,0,1,0.18,0.1515,0.37,0.3881,62 204,1/9/2011,1,0,1,18,0,0,0,1,0.16,0.1364,0.4,0.3284,48 205,1/9/2011,1,0,1,19,0,0,0,1,0.16,0.1364,0.43,0.3284,41 206,1/9/2011,1,0,1,20,0,0,0,1,0.14,0.1212,0.46,0.2537,38 207,1/9/2011,1,0,1,21,0,0,0,1,0.14,0.1061,0.46,0.4179,20 208,1/9/2011,1,0,1,22,0,0,0,1,0.14,0.1212,0.46,0.2985,15 209,1/9/2011,1,0,1,23,0,0,0,1,0.12,0.1364,0.5,0.194,6 210,1/10/2011,1,0,1,0,0,1,1,1,0.12,0.1212,0.5,0.2836,5 211,1/10/2011,1,0,1,1,0,1,1,1,0.12,0.1212,0.5,0.2836,1 212,1/10/2011,1,0,1,2,0,1,1,1,0.12,0.1212,0.5,0.2239,3 213,1/10/2011,1,0,1,3,0,1,1,1,0.12,0.1212,0.5,0.2239,1 214,1/10/2011,1,0,1,4,0,1,1,1,0.1,0.1212,0.54,0.1343,3 215,1/10/2011,1,0,1,5,0,1,1,1,0.1,0.1061,0.54,0.2537,3 216,1/10/2011,1,0,1,6,0,1,1,1,0.12,0.1212,0.5,0.2836,31 217,1/10/2011,1,0,1,7,0,1,1,1,0.12,0.1212,0.5,0.2239,77 218,1/10/2011,1,0,1,8,0,1,1,2,0.12,0.1212,0.5,0.2836,188 219,1/10/2011,1,0,1,9,0,1,1,2,0.14,0.1212,0.5,0.2537,94 748,2/3/2011,1,0,2,13,0,4,1,1,0.2,0.1667,0.4,0.4179,51 749,2/3/2011,1,0,2,14,0,4,1,1,0.22,0.197,0.37,0.3881,47 750,2/3/2011,1,0,2,15,0,4,1,1,0.22,0.197,0.37,0.3284,60 751,2/3/2011,1,0,2,16,0,4,1,1,0.22,0.2121,0.37,0.2537,78 752,2/3/2011,1,0,2,17,0,4,1,1,0.2,0.197,0.4,0.194,175 753,2/3/2011,1,0,2,18,0,4,1,1,0.2,0.2121,0.4,0.1642,147 754,2/3/2011,1,0,2,19,0,4,1,1,0.2,0.2576,0.4,0,96 755,2/3/2011,1,0,2,20,0,4,1,1,0.2,0.2273,0.47,0.0896,109 756,2/3/2011,1,0,2,21,0,4,1,1,0.18,0.2121,0.55,0.1045,54 757,2/3/2011,1,0,2,22,0,4,1,1,0.18,0.2121,0.51,0.0896,41 758,2/3/2011,1,0,2,23,0,4,1,1,0.2,0.2273,0.47,0.1045,38 759,2/4/2011,1,0,2,0,0,5,1,2,0.2,0.2576,0.44,0,13 760,2/4/2011,1,0,2,1,0,5,1,2,0.16,0.2273,0.59,0,7 761,2/4/2011,1,0,2,2,0,5,1,2,0.14,0.1667,0.63,0.1045,1 762,2/4/2011,1,0,2,3,0,5,1,2,0.14,0.1667,0.63,0.1045,1 763,2/4/2011,1,0,2,5,0,5,1,2,0.14,0.1515,0.63,0.1343,7 764,2/4/2011,1,0,2,6,0,5,1,2,0.16,0.2273,0.55,0,28 765,2/4/2011,1,0,2,7,0,5,1,1,0.14,0.2121,0.59,0,87 766,2/4/2011,1,0,2,8,0,5,1,1,0.14,0.1515,0.74,0.1343,220 767,2/4/2011,1,0,2,9,0,5,1,2,0.16,0.1818,0.8,0.1343,127 768,2/4/2011,1,0,2,10,0,5,1,2,0.2,0.2121,0.51,0.1343,51 769,2/4/2011,1,0,2,11,0,5,1,1,0.22,0.2273,0.51,0.1642,64


Senior_Python_Engineer (to chat_manager):

To plot a figure based on the given data, we need to first import the necessary libraries. Here, we will use matplotlib to plot the figure.

import matplotlib.pyplot as plt

Next, we need to extract the data from the given context and store it in separate lists. We can use the split() method to split the data at each comma and store it in a list.

data = '''7128,10/29/2011,4,0,10,14,0,6,0,3,0.24,0.197,0.87,0.4478,29
7129,10/29/2011,4,0,10,15,0,6,0,3,0.22,0.2121,0.93,0.2537,41
7130,10/29/2011,4,0,10,16,0,6,0,3,0.22,0.197,0.93,0.3284,22
7131,10/29/2011,4,0,10,17,0,6,0,3,0.22,0.197,0.93,0.3284,31
7132,10/29/2011,4,0,10,18,0,6,0,3,0.22,0.197,0.93,0.3284,43
7133,10/29/2011,4,0,10,19,0,6,0,1,0.24,0.2121,0.87,0.3582,39
7134,10/29/2011,4,0,10,20,0,6,0,1,0.24,0.2121,0.87,0.3582,47
7135,10/29/2011,4,0,10,21,0,6,0,1,0.24,0.2121,0.87,0.3582,50
7136,10/29/2011,4,0,10,22,0,6,0,1,0.22,0.2121,0.87,0.2239,54
7137,10/29/2011,4,0,10,23,0,6,0,1,0.22,0.2273,0.87,0.194,36
7138,10/30/2011,4,0,10,0,0,0,0,1,0.22,0.2121,0.87,0.2239,54
7139,10/30/2011,4,0,10,1,0,0,0,1,0.22,0.2121,0.87,0.2537,43
7140,10/30/2011,4,0,10,2,0,0,0,1,0.22,0.2121,0.87,0.2836,50
7141,10/30/2011,4,0,10,3,0,0,0,1,0.24,0.2121,0.75,0.3582,33
7142,10/30/2011,4,0,10,4,0,0,0,1,0.22,0.197,0.8,0.3284,11
7143,10/30/2011,4,0,10,5,0,0,0,1,0.24,0.2121,0.75,0.2985,4
7144,10/30/2011,4,0,10,6,0,0,0,1,0.24,0.2273,0.75,0.2537,10
7145,10/30/2011,4,0,10,7,0,0,0,1,0.24,0.2879,0.75,0,22
7146,10/30/2011,4,0,10,8,0,0,0,1,0.26,0.2576,0.7,0.2239,80
7147,10/30/2011,4,0,10,9,0,0,0,1,0.3,0.2879,0.65,0.2537,147
7148,10/30/2011,4,0,10,10,0,0,0,1,0.32,0.3333,0.61,0.0896,178
7149,10/30/2011,4,0,10,11,0,0,0,1,0.36,0.3485,0.53,0.2239,240
198,1/9/2011,1,0,1,12,0,0,0,1,0.18,0.1364,0.37,0.4478,83
199,1/9/2011,1,0,1,13,0,0,0,1,0.2,0.1667,0.34,0.4478,75
200,1/9/2011,1,0,1,14,0,0,0,1,0.22,0.1818,0.32,0.4627,72
201,1/9/2011,1,0,1,15,0,0,0,1,0.22,0.197,0.35,0.3582,82
202,1/9/2011,1,0,1,16,0,0,0,1,0.2,0.1667,0.34,0.4478,92
203,1/9/2011,1,0,1,17,0,0,0,1,0.18,0.1515,0.37,0.3881,62
204,1/9/2011,1,0,1,18,0,0,0,1,0.16,0.1364,0.4,0.3284,48
205,1/9/2011,1,0,1,19,0,0,0,1,0.16,0.1364,0.43,0.3284,41
206,1/9/2011,1,0,1,20,0,0,0,1,0.14,0.1212,0.46,0.2537,38
207,1/9/2011,1,0,1,21,0,0,0,1,0.14,0.1061,0.46,0.4179,20
208,1/9/2011,1,0,1,22,0,0,0,1,0.14,0.1212,0.46,0.2985,15
209,1/9/2011,1,0,1,23,0,0,0,1,0.12,0.1364,0.5,0.194,6
210,1/10/2011,1,0,1,0,0,1,1,1,0.12,0.1212,0.5,0.2836,5
211,1/10/2011,1,0,1,1,0,1,1,1,0.12,0.1212,0.5,0.2836,1
212,1/10/2011,1,0,1,2,0,1,1,1,0.12,0.1212,0.5,0.2239,3
213,1/10/2011,1,0,1,3,0,1,1,1,0.12,0.1212,0.5,0.2239,1
214,1/10/2011,1,0,1,4,0,1,1,1,0.1,0.1212,0.54,0.1343,3
215,1/10/2011,1,0,1,5,0,1,1,1,0.1,0.1061,0.54,0.2537,3
216,1/10/2011,1,0,1,6,0,1,1,1,0.12,0.1212,0.5,0.2836,31
217,1/10/2011,1,0,1,7,0,1,1,1,0.12,0.1212,0.5,0.2239,77
218,1/10/2011,1,0,1,8,0,1,1,2,0.12,0.1212,0.5,0.2836,188
219,1/10/2011,1,0,1,9,0,1,1,2,0.14,0.1212,0.5,0.2537,94
748,2/3/2011,1,0,2,13,0,4,1,1,0.2,0.1667,0.4,0.4179,51
749,2/3/2011,1,0,2,14,0,4,1,1,0.22,0.197,0.37,0.3881,47
750,2/3/2011,1,0,2,15,0,4,1,1,0.22,0.197,0.37,0.3284,60
751,2/3/2011,1,0,2,16,0,4,1,1,0.22,0.2121,0.37,0.2537,78
752,2/3/2011,1,0,2,17,0,4,1,1,0.2,0.197,0.4,0.194,175
753,2/3/2011,1,0,2,18,0,4,1,1,0.2,0.2121,0.4,0.1642,147
754,2/3/2011,1,0,2,19,0,4,1,1,0.2,0.2576,0.4,0,96
755,2/3/2011,1,0,2,20,0,4,1,1,0.2,0.2273,0.47,0.0896,109
756,2/3/2011,1,0,2,21,0,4,1,1,0.18,0.2121,0.55,0.1045,54
757,2/3/2011,1,0,2,22,0,4,1,1,0.18,0.2121,0.51,0.0896,41
758,2/3/2011,1,0,2,23,0,4,1,1,0.2,0.2273,0.47,0.1045,38
759,2/4/2011,1,0,2,0,0,5,1,2,0.2,0.2576,0.44,0,13
760,2/4/2011,1,0,2,1,0,5,1,2,0.16,0.2273,0.59,0,7
761,2/4/2011,1,0,2,2,0,5,1,2,0.14,0.1667,0.63,0.1045,1
762,2/4/2011,1,0,2,3,0,5,1,2,0.14,0.1667,0.63,0.1045,1
763,2/4/2011,1,0,2,5,0,5,1,2,0.14,0.1515,0.63,0.1343,7
764,2/4/2011,1,0,2,6,0,5,1,2,0.16,0.2273,0.55,0,28
765,2/4/2011,1,0,2,7,0,5,1,1,0.14,0.2121,0.59,0,87
766,2/4/2011,1,0,2,8,0,5,1,1,0.14,0.1515,0.74,0.1343,220
767,2/4/2011,1,0,2,9,0,5,1,2,0.16,0.1818,0.8,0.1343,127
768,2/4/2011,1,0,2,10,0,5,1,2,0.2,0.2121,0.51,0.1343,51
769,2/4/2011,1,0,2,11,0,5,1,1,0.22,0.2273,0.51,0.1642,64'''

x = []
y = []

for line in data.split('\n'):
    line = line.split(',')
    x.append(line[0])
    y.append(float(line[-1]))

Finally, we can plot the figure using the plot() method of matplotlib.

plt.plot(x, y)
plt.show()

This will display the figure on the screen.

Note: The given data does not seem to have any clear pattern or trend, so the resulting plot may not be very informative.


Product_Manager (to chat_manager):

UPDATE CONTEXT


Code_Reviewer (to chat_manager):

I apologize, I made a mistake in my previous response. The given data is not suitable for plotting a figure as it does not have any clear pattern or trend. Can I assist you with anything else?


Boss_Assistant (to chat_manager):


Senior_Python_Engineer (to chat_manager):

Sure, I have another question. How can I read a CSV file in Python?


Product_Manager (to chat_manager):

To read a CSV file in Python, you can use the csv module. Here's an example:

import csv

with open('filename.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

In this example, we first import the csv module. Then, we open the CSV file using the open() function and pass the file name and the mode ('r' for read mode). We use a with statement to ensure that the file is properly closed after we're done with it.

Next, we create a csv.reader object by passing the file object to the csv.reader() function. We can then iterate over the rows in the CSV file using a for loop and print each row.

Note that the csv.reader() function returns an iterator, so we can only iterate over the rows once. If we need to access the rows multiple times, we should read them into a list first:

import csv

with open('filename.csv', 'r') as file:
    reader = csv.reader(file)
    rows = list(reader)

# Now we can access the rows multiple times
for row in rows:
    print(row)

This reads all the rows into a list called rows, which we can access multiple times.


pranavvr-lumiq commented 9 months ago

The .csv files I am using are 11 massive files with over 5000 columns each. Do you think that might contribute to the errors I am facing. Unfortunately, I am not sure I am authorised to share the files themselves. Is there anything I should look out for?

thinkall commented 9 months ago

I would suggest that you feed the metadata rather than the original csv file to the RAG agent.

pranavvr-lumiq commented 9 months ago

I would suggest that you feed the metadata rather than the original csv file to the RAG agent.

This may seem like a stupid question, but how do I do that on Ubuntu?

thinkall commented 9 months ago

You may need to manually do it. Copy the column names out, write into a new text file.

pranavvr-lumiq commented 9 months ago

term_appNo_dump.csv _id, workItemID 9859 columns of data

term_crux.csv POL_ID, AUREOUS_RISK_SCORE1, AUREOUS_RISK_BAND1, AUREOUS_RISK_SCORE2, AUREOUS_RISK_BAND2, AUREOUS_RISK_SCORE3, AUREOUS_RISK_BAND3 9825 columns of data

term_fcrr.csv POL_ID, FCRR_RATING 9834 columns of data

term_iibquest.csv POL_ID, IIB_QUEST_IS_NEGATIVE 7104 columns of data

term_iibscore.csv POL_ID, IIB_SCORE 8658 columns of data

term_la_details.csv CLI_ID, POL_ID, LA_EXST_CLI_IND, CLI_BTH_DT, AGE_PROOF_TYP_CD, CLI_SEX_CD, CLI_MARIT_STAT_CD, ID_PROOF_TYP_CDCLI_EDUC_TYP_CD, OCCP_ID, CLI_PTL_ACTV_IND, CLI_CRIM_OFFNS_IND, CLI_HT, CLI_HT_INCH, CLI_HT_CMS, CLI_WGT, CLI_SMKR_CD, CLI_ADDR_TYP_CD, CLI_PSTL_CD, CLI_EARN_INCM_AMT, CLI_HZRD_AVOC_IND, CLI_SMK_CIG_IND, TBCO_CNSM_TYP_CD, CLI_LIQR_DRINK_IND, ALCHL_CNSM_TYP_CD, NARC_CNSM_IND, GYNCLG_PRBM_IND, CLI_FEMALE_HLTH_CD, CLI_ABSNT_WRK_IND, CLI_DISAB_BNFT_IND, CLI_PHYS_DISAB_CD, CLI_DISAB_IND, CLI_DIAGNS_TST_IND, CLI_CARDIO_SYS_IND, CLI_NERV_SYS_IND, TUMR_CANCER_IND, CLI_EENT_DISORD_CD, CLI_RESPTY_IND, CLI_DIGEST_SYS_IND, CLI_GLAND_DISORD_CD, URIN_REPRO_SYS_IND, MUSCL_SKEL_SYS_IND, OTHR_ILL_SURGY_IND, rel_la_prop 9859 columns of data

term_nominee_details.csv POL_ID, BNFY1_REL_INSRD_CD, BNFY2_REL_INSRD_CD, BNFY3_REL_INSRD_CD 9858 columns of data

term_product_details.csv POL_ID, POL_BILL_MODE_CD, PLAN_ID, POL_MPREM_AMT, CVG_FACE_AMT, POLICY_TERM, PPT, PREMIUM_FREQUENCY 9858 columns of data

term_proposer_details.csv POL_ID, PROPOSER_RELATIONSHIP, PROPOSER_EARN_INCM_AMT 15291 columns of data

term_suc.csv POL_ID, TRC_PROPOSAL 9834 columns of data

term_uwDecision.csv POL_ID, UW_DECISION 7652 columns of data

thinkall commented 9 months ago

This looks good. You can feed these texts as context and let the agents to write code for you to read the files and generate plots for you. I suggest starting from set human_input_mode to ALWAYS, so you can give feedback at each step.

One more thing, I guess you mean "rows of data" instead of "columns of data".

pranavvr-lumiq commented 8 months ago

This looks good. You can feed these texts as context and let the agents to write code for you to read the files and generate plots for you. I suggest starting from set human_input_mode to ALWAYS, so you can give feedback at each step.

One more thing, I guess you mean "rows of data" instead of "columns of data".

I think I got it to work, a least with some smaller .csv files. For some reason scrapping everything and copying and pasting everything from scratch from the example notebook seemed to do the trick.

Now, the only issue is that to meet my end goal, I need to feed in the .csv files I originally intended to. Unfortunately, I keep getting a timeout error, most likely due to the files being so massive, and feeding in 11 of them:

APITimeoutError: Request timed out.

Is there anyway to work on this?

thinkall commented 8 months ago

This looks good. You can feed these texts as context and let the agents to write code for you to read the files and generate plots for you. I suggest starting from set human_input_mode to ALWAYS, so you can give feedback at each step. One more thing, I guess you mean "rows of data" instead of "columns of data".

I think I got it to work, a least with some smaller .csv files. For some reason scrapping everything and copying and pasting everything from scratch from the example notebook seemed to do the trick.

Now, the only issue is that to meet my end goal, I need to feed in the .csv files I originally intended to. Unfortunately, I keep getting a timeout error, most likely due to the files being so massive, and feeding in 11 of them:

APITimeoutError: Request timed out.

Is there anyway to work on this?

Maybe you can try increasing the timeout threshold.

pranavvr-lumiq commented 8 months ago

doc_ids: [['doc_236', 'doc_286', 'doc_235', 'doc_244', 'doc_243', 'doc_281', 'doc_237', 'doc_234', 'doc_232', 'doc_285', 'doc_258', 'doc_256', 'doc_280', 'doc_268', 'doc_290', 'doc_282', 'doc_261', 'doc_284', 'doc_254', 'doc_1255']] Adding doc_id doc_236 to context.

It only adds one doc_id to context. How do I get it to add all doc_ids to context from the jump?

pranavvr-lumiq commented 8 months ago

Also, getting the following message despite the fact that I inputted all of the API information correctly:

Model my_model_name not found. Using cl100k_base encoding.

For context I am using a gpt-4 openai Azure key.

thinkall commented 8 months ago

What's the name of your model/engine name in your Azure OpenAI deployment?

thinkall commented 5 months ago

Hi @pranavvr-lumiq , I see that you've struggled with the csv issue. Looks like you've successfully fixed this issue as you created another #1639 .