langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.67k stars 15.32k forks source link

SelfQueryRetriever with an OpenSearch vector store doesn't work. #20562

Closed Aekansh-Ak closed 3 months ago

Aekansh-Ak commented 6 months ago

Checked other resources

Example Code

from langchain_community.vectorstores import OpenSearchVectorSearch from langchain_community.document_loaders import TextLoader from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from langchain_community.document_loaders import DirectoryLoader from langchain_community.embeddings import HuggingFaceEmbeddings from langchain_core.documents import Document from langchain.chains.query_constructor.base import AttributeInfo import torch

embeddings = HuggingFaceEmbeddings() docs = [ Document( page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose", metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"}, ), Document( page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...", metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2}, ), Document( page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea", metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6}, ), Document( page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them", metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3}, ), Document( page_content="Toys come alive and have a blast doing so", metadata={"year": 1995, "genre": "animated"}, ), Document( page_content="Three men walk into the Zone, three men walk out of the Zone", metadata={ "year": 1979, 38 "rating": 9.9, 39 "director": "Andrei Tarkovsky", 40 "genre": "science fiction", 41 }, 42 ), 43 ] 44 45 46 47 vectorstore = OpenSearchVectorSearch.from_documents( 48 docs, 49 embeddings, 50 index_name="opensearch-self-query-demo", 51 opensearch_url="https://admin:admin@localhost:9200",use_ssl = False, verify_certs = False 52 ) 53 54 model_id = "lmsys/vicuna-13b-v1.5" 55 tokenizer = AutoTokenizer.from_pretrained(model_id) 56 model = AutoModelForCausalLM.from_pretrained(model_id) 57 pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200, device_map="auto",torch_dtype=torch.float16) 58 llm = HuggingFacePipeline(pipeline=pipe) 59 60 metadata_field_info = [ 61 AttributeInfo( 62 name="genre", 63 description="The genre of the movie", 64 type="string or list[string]", 65 ), 66 AttributeInfo( 67 name="year", 68 description="The year the movie was released", 69 type="integer", 70 ), 71 AttributeInfo( 72 name="director", 73 description="The name of the movie director", 74 type="string", 75 ), 76 AttributeInfo( 77 name="rating", description="A 1-10 rating for the movie", type="float" 78 ), 79 ] 80 document_content_description = "Brief summary of a movie" 81 82 83 retriever = SelfQueryRetriever.from_llm( 84 llm, vectorstore, document_content_description, metadata_field_info, verbose=True 85 ) 86 87 pol = retriever.get_relevant_documents("What are some movies about dinosaurs") 88 print(pol)

Error Message and Stack Trace (if applicable)

Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/langchain_core/output_parsers/json.py", line 175, in parse_and_check_json_markdown json_obj = parse_json_markdown(text) File "/usr/local/lib/python3.10/site-packages/langchain_core/output_parsers/json.py", line 157, in parse_json_markdown parsed = parser(json_str) File "/usr/local/lib/python3.10/site-packages/langchain_core/output_parsers/json.py", line 125, in parse_partial_json return json.loads(s, strict=strict) File "/usr/local/lib/python3.10/json/init.py", line 359, in loads return cls(**kw).decode(s) File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 14 (char 15)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/langchain/chains/query_constructor/base.py", line 50, in parse parsed = parse_and_check_json_markdown(text, expected_keys) File "/usr/local/lib/python3.10/site-packages/langchain_core/output_parsers/json.py", line 177, in parse_and_check_json_markdown raise OutputParserException(f"Got invalid JSON object. Error: {e}") langchain_core.exceptions.OutputParserException: Got invalid JSON object. Error: Expecting value: line 2 column 14 (char 15)

Description

I am following this documentation-:

https://python.langchain.com/docs/integrations/retrievers/self_query/opensearch_self_query/

System Info

System Information

OS: Linux OS Version: #1 SMP Wed Jan 10 22:58:54 UTC 2024 Python Version: 3.10.7 (main, Feb 29 2024, 10:06:00) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)]

Package Information

langchain_core: 0.1.33 langchain: 0.1.13 langchain_community: 0.0.29 langsmith: 0.1.31 langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

spike-spiegel-21 commented 6 months ago

Check if the llm is working correctly.


55 tokenizer = AutoTokenizer.from_pretrained(model_id)
56 model = AutoModelForCausalLM.from_pretrained(model_id)
57 pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200, device_map="auto",torch_dtype=torch.float16)
58 llm = HuggingFacePipeline(pipeline=pipe)```
Aekansh-Ak commented 6 months ago

It does, same technique is used in other codes.