Closed mohammad-yousuf closed 8 months ago
Thanks for reporting, fixed in https://github.com/snexus/llm-search/pull/104 To be honest, don't think you will get great results with CSVs, it is not the best format for RAG.
Hi @snexus. Thank you for the fix. I converted the csv data to docx table and used custom parser. The data is being converted to json format correctly as I can see after re-ranking step. After that, it doesn't work for closely related data points.
Any idea how should I approach this?
Most likely reason LLM can't interpret it correctly - it is a limitation of LLM rather RAG system as a whole. LLMs are not very good with tabular data. Maybe there is specialised LLM exist for that.
Another (more complicated approach) - to store data in a database, provide schema and other metadata information to LLM and let it generate SQL that produces necessary aggregations etc...
Hi @snexus,
Is it possible to work with CSV/SQL data. Since you have mentioned unstructured supported formats which includes csv as well. I am trying to parse csv but getting errors:
`Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 297, in add_texts self._collection.upsert( File "/usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py", line 477, in upsert ) = self._validate_embedding_set( File "/usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py", line 554, in _validate_embedding_set validate_metadatas(maybe_cast_one_to_many_metadata(metadatas)) File "/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py", line 310, in validate_metadatas validate_metadata(metadata) File "/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py", line 278, in validate_metadata raise ValueError( ValueError: Expected metadata value to be a str, int, float or bool, got None which is a <class 'NoneType'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/bin/llmsearch", line 8, in
sys.exit(main_cli())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/llmsearch/cli.py", line 44, in generate_index
create_embeddings(config, vs)
File "/usr/local/lib/python3.10/dist-packages/llmsearch/embeddings.py", line 80, in create_embeddings
vs.create_index_from_documents(all_docs=all_docs)
File "/usr/local/lib/python3.10/dist-packages/llmsearch/chroma.py", line 66, in create_index_from_documents
vectordb = Chroma.from_documents(
File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 778, in from_documents
return cls.from_texts(
File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 736, in from_texts
chroma_collection.add_texts(
File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 309, in add_texts
raise ValueError(e.args[0] + "\n\n" + msg)
ValueError: Expected metadata value to be a str, int, float or bool, got None which is a <class 'NoneType'>
Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.`