langchain-ai / langchain

πŸ¦œπŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.53k stars 14.82k forks source link

Issue with SemanticSimilarityExampleSelector.from_examples #17846

Closed NikhilKosare closed 2 months ago

NikhilKosare commented 7 months ago

Checked other resources

Example Code

SemanticSimilarityExampleSelector.from_examples is throwing an error :

ServiceRequestError: Invalid URL "/indexes('langchain-index')?api-version=2023-10-01-Preview": No scheme supplied. Perhaps you meant https:///indexes('langchain-index')?api-version=2023-10-01-Preview?

I am creating vector store and using it in SemanticSimilarityExampleSelector.from_examples like below:

vector_store = AzureSearch( azure_search_endpoint=vector_store_address, azure_search_key=vector_store_password, index_name=index_name, embedding_function=embeddings.embed_query, )

example_selector = SemanticSimilarityExampleSelector.from_examples( examples, embeddings, vector_store, k=5, input_keys=["input"], )

Error Message and Stack Trace (if applicable)

erviceRequestError Traceback (most recent call last) Cell In[11], line 1 ----> 1 example_selector = SemanticSimilarityExampleSelector.from_examples( 2 examples, 3 embeddings, 4 vector_store, 5 k=5, 6 input_keys=["input"] 7 )

File ~\anaconda3\Lib\site-packages\langchain_core\example_selectors\semantic_similarity.py:105, in SemanticSimilarityExampleSelector.from_examples(cls, examples, embeddings, vectorstore_cls, k, input_keys, example_keys, vectorstore_kwargs, vectorstore_cls_kwargs) 103 else: 104 string_examples = [" ".join(sorted_values(eg)) for eg in examples] --> 105 vectorstore = vectorstore_cls.from_texts( 106 string_examples, embeddings, metadatas=examples, vectorstore_cls_kwargs 107 ) 108 return cls( 109 vectorstore=vectorstore, 110 k=k, (...) 113 vectorstore_kwargs=vectorstore_kwargs, 114 )

File ~\anaconda3\Lib\site-packages\langchain_community\vectorstores\azuresearch.py:632, in AzureSearch.from_texts(cls, texts, embedding, metadatas, azure_search_endpoint, azure_search_key, index_name, kwargs) 620 @classmethod 621 def from_texts( 622 cls: Type[AzureSearch], (...) 630 ) -> AzureSearch: 631 # Creating a new Azure Search instance --> 632 azure_search = cls( 633 azure_search_endpoint, 634 azure_search_key, 635 index_name, 636 embedding, 637 ) 638 azure_search.add_texts(texts, metadatas, kwargs) 639 return azure_search

File ~\anaconda3\Lib\site-packages\langchain_community\vectorstores\azuresearch.py:269, in AzureSearch.init(self, azure_search_endpoint, azure_search_key, index_name, embedding_function, search_type, semantic_configuration_name, fields, vector_search, semantic_configurations, scoring_profiles, default_scoring_profile, cors_options, **kwargs) 267 if "user_agent" in kwargs and kwargs["user_agent"]: 268 user_agent += " " + kwargs["user_agent"] --> 269 self.client = _get_search_client( 270 azure_search_endpoint, 271 azure_search_key, 272 index_name, 273 semantic_configuration_name=semantic_configuration_name, 274 fields=fields, 275 vector_search=vector_search, 276 semantic_configurations=semantic_configurations, 277 scoring_profiles=scoring_profiles, 278 default_scoring_profile=default_scoring_profile, 279 default_fields=default_fields, 280 user_agent=user_agent, 281 cors_options=cors_options, 282 ) 283 self.search_type = search_type 284 self.semantic_configuration_name = semantic_configuration_name

File ~\anaconda3\Lib\site-packages\langchain_community\vectorstores\azuresearch.py:112, in _get_search_client(endpoint, key, index_name, semantic_configuration_name, fields, vector_search, semantic_configurations, scoring_profiles, default_scoring_profile, default_fields, user_agent, cors_options) 108 index_client: SearchIndexClient = SearchIndexClient( 109 endpoint=endpoint, credential=credential, user_agent=user_agent 110 ) 111 try: --> 112 index_client.get_index(name=index_name) 113 except ResourceNotFoundError: 114 # Fields configuration 115 if fields is not None: 116 # Check mandatory fields

File ~\anaconda3\Lib\site-packages\azure\core\tracing\decorator.py:78, in distributed_trace..decorator..wrapper_use_tracer(*args, *kwargs) 76 span_impl_type = settings.tracing_implementation() 77 if span_impl_type is None: ---> 78 return func(args, **kwargs) 80 # Merge span is parameter is set, but only if no explicit parent are passed 81 if merge_span and not passed_in_parent:

File ~\anaconda3\Lib\site-packages\azure\search\documents\indexes_search_index_client.py:149, in SearchIndexClient.get_index(self, name, kwargs) 131 """ 132 133 :param name: The name of the index to retrieve. (...) 146 :caption: Get an index. 147 """ 148 kwargs["headers"] = self._merge_client_headers(kwargs.get("headers")) --> 149 result = self._client.indexes.get(name, kwargs) 150 return SearchIndex._from_generated(result)

File ~\anaconda3\Lib\site-packages\azure\core\tracing\decorator.py:78, in distributed_trace..decorator..wrapper_use_tracer(*args, *kwargs) 76 span_impl_type = settings.tracing_implementation() 77 if span_impl_type is None: ---> 78 return func(args, **kwargs) 80 # Merge span is parameter is set, but only if no explicit parent are passed 81 if merge_span and not passed_in_parent:

File ~\anaconda3\Lib\site-packages\azure\search\documents\indexes_generated\operations_indexes_operations.py:857, in IndexesOperations.get(self, index_name, request_options, kwargs) 854 _request.url = self._client.format_url(_request.url, path_format_arguments) 856 _stream = False --> 857 pipeline_response: PipelineResponse = self._client._pipeline.run( # pylint: disable=protected-access 858 _request, stream=_stream, **kwargs 859 ) 861 response = pipeline_response.http_response 863 if response.status_code not in [200]:

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:230, in Pipeline.run(self, request, **kwargs) 228 pipeline_request: PipelineRequest[HTTPRequestType] = PipelineRequest(request, context) 229 first_node = self._impl_policies[0] if self._impl_policies else _TransportRunner(self._transport) --> 230 return first_node.send(pipeline_request)

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:86, in _SansIOHTTPPolicyRunner.send(self, request) 84 _await_result(self._policy.on_request, request) 85 try: ---> 86 response = self.next.send(request) 87 except Exception: # pylint: disable=broad-except 88 _await_result(self._policy.on_exception, request)

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:86, in _SansIOHTTPPolicyRunner.send(self, request) 84 _await_result(self._policy.on_request, request) 85 try: ---> 86 response = self.next.send(request) 87 except Exception: # pylint: disable=broad-except 88 _await_result(self._policy.on_exception, request)

[... skipping similar frames: _SansIOHTTPPolicyRunner.send at line 86 (2 times)]

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:86, in _SansIOHTTPPolicyRunner.send(self, request) 84 _await_result(self._policy.on_request, request) 85 try: ---> 86 response = self.next.send(request) 87 except Exception: # pylint: disable=broad-except 88 _await_result(self._policy.on_exception, request)

File ~\anaconda3\Lib\site-packages\azure\core\pipeline\policies_redirect.py:197, in RedirectPolicy.send(self, request) 195 original_domain = get_domain(request.http_request.url) if redirect_settings["allow"] else None 196 while retryable: --> 197 response = self.next.send(request) 198 redirect_location = self.get_redirect_location(response) 199 if redirect_location and redirect_settings["allow"]:

File ~\anaconda3\Lib\site-packages\azure\core\pipeline\policies_retry.py:553, in RetryPolicy.send(self, request) 551 is_response_error = True 552 continue --> 553 raise err 554 finally: 555 end_time = time.time()

File ~\anaconda3\Lib\site-packages\azure\core\pipeline\policies_retry.py:531, in RetryPolicy.send(self, request) 529 try: 530 self._configure_timeout(request, absolute_timeout, is_response_error) --> 531 response = self.next.send(request) 532 if self.is_retry(retry_settings, response): 533 retry_active = self.increment(retry_settings, response=response)

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:86, in _SansIOHTTPPolicyRunner.send(self, request) 84 _await_result(self._policy.on_request, request) 85 try: ---> 86 response = self.next.send(request) 87 except Exception: # pylint: disable=broad-except 88 _await_result(self._policy.on_exception, request)

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:86, in _SansIOHTTPPolicyRunner.send(self, request) 84 _await_result(self._policy.on_request, request) 85 try: ---> 86 response = self.next.send(request) 87 except Exception: # pylint: disable=broad-except 88 _await_result(self._policy.on_exception, request)

[... skipping similar frames: _SansIOHTTPPolicyRunner.send at line 86 (2 times)]

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:86, in _SansIOHTTPPolicyRunner.send(self, request) 84 _await_result(self._policy.on_request, request) 85 try: ---> 86 response = self.next.send(request) 87 except Exception: # pylint: disable=broad-except 88 _await_result(self._policy.on_exception, request)

File ~\anaconda3\Lib\site-packages\azure\core\pipeline_base.py:119, in _TransportRunner.send(self, request) 109 """HTTP transport send method. 110 111 :param request: The PipelineRequest object. (...) 114 :rtype: ~azure.core.pipeline.PipelineResponse 115 """ 116 cleanup_kwargs_for_transport(request.context.options) 117 return PipelineResponse( 118 request.http_request, --> 119 self._sender.send(request.http_request, **request.context.options), 120 context=request.context, 121 )

File ~\anaconda3\Lib\site-packages\azure\core\pipeline\transport_requests_basic.py:381, in RequestsTransport.send(self, request, **kwargs) 378 error = ServiceRequestError(err, error=err) 380 if error: --> 381 raise error 382 if _is_rest(request): 383 from azure.core.rest._requests_basic import RestRequestsTransportResponse

ServiceRequestError: Invalid URL "/indexes('langchain-index')?api-version=2023-10-01-Preview": No scheme supplied. Perhaps you meant https:///indexes('langchain-index')?api-version=2023-10-01-Preview?

Description

I am trying to use the example selector (SemanticSimilarityExampleSelector.from_examples) feature of langchain.

System Info

langchain 0.1.8

platform - windows

python version - 3.11.5

dosubot[bot] commented 7 months ago

πŸ€–

Hey @NikhilKosare, nice to see you back! Running into a bit of a snag, I see? Let's dive into it.

Based on the error message you provided, it seems like the issue might be related to the azure_search_endpoint parameter you're passing to the AzureSearch class. The error message ServiceRequestError: Invalid URL "/indexes('langchain-index')?api-version=2023-10-01-Preview": No scheme supplied suggests that the URL you're providing might be incomplete or incorrectly formatted.

In the AzureSearch class, the azure_search_endpoint parameter is expected to be a string representing the URL of the Azure Search service endpoint. This URL should include the correct scheme (e.g., "http://" or "https://"). Here's the relevant code from the AzureSearch class where the azure_search_endpoint is used:

index_client: SearchIndexClient = SearchIndexClient(
    endpoint=endpoint, credential=credential, user_agent=user_agent
)

In this code, endpoint is the URL of the Azure Search service. If the URL you're providing doesn't include the correct scheme or is otherwise incorrectly formatted, it could lead to the error you're seeing.

Please ensure that the azure_search_endpoint parameter you're passing to the AzureSearch class includes the correct scheme and is otherwise correctly formatted. If the issue persists after checking this, please provide more information about the azure_search_endpoint parameter you're using.

Additionally, there have been similar issues reported in the LangChain repository, such as this one. The suggested solution in these cases was to upgrade the version of LangChain being used. If you're not already using the latest version of LangChain, you might consider upgrading to see if that resolves the issue.

I hope this helps! If you have any further questions or need more assistance, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
NikhilKosare commented 7 months ago

@dosubot- the azure search service end point url looks to be correct. I am using below url "https://ss-xxxxx.search.windows.net" . Also i am using the latest version of langchain

langchain - 0.1.8

dosubot[bot] commented 7 months ago

πŸ€–

Based on the error message you're encountering, it seems like the issue might be related to the azure_search_endpoint parameter not being correctly recognized as a valid URL. This could be due to the AZURE_SEARCH_ENDPOINT environment variable not being correctly set or not being a valid URL.

In the LangChain codebase, the azure_search_endpoint parameter is used to specify the Azure endpoint that the OpenAI API should interact with. This parameter is automatically inferred from the environment variable AZURE_SEARCH_ENDPOINT if it is not provided when creating an instance of the AzureSearch class.

Here's a snippet of how it's used:

values["azure_endpoint"] = values["azure_endpoint"] or os.getenv("AZURE_OPENAI_ENDPOINT")

And later, this value is used to create an instance of the openai.AzureOpenAI and openai.AsyncAzureOpenAI classes:

client_params = {
    ...
    "azure_endpoint": values["azure_endpoint"],
    ...
}
values["client"] = openai.AzureOpenAI(**client_params).completions
values["async_client"] = openai.AsyncAzureOpenAI(**client_params).completions

To resolve the issue, please ensure that the AZURE_SEARCH_ENDPOINT environment variable is set to a valid URL. It should look something like this: https://mysearchservice.search.windows.net.

If you've already checked this and the issue persists, it might be helpful to print out the value of azure_search_endpoint right before it's used to create the AzureSearch instance to ensure it's being correctly set.

If you're still encountering issues after trying these steps, please provide more information about your setup and I'll be happy to help further.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
NikhilKosare commented 7 months ago

@dosubot - I am still getting the same error. the value of os.getenv('AZURE_SEARCH_ENDPOINT') looks correct,

vector_store = AzureSearch( azure_search_endpoint=os.getenv('AZURE_SEARCH_ENDPOINT'), azure_search_key=os.getenv('AZURE_SEARCH_PASSWORD'), index_name=index_name, embedding_function=embeddings.embed_query, )

library versions i am using:

langchain 0.1.8 langchain-community 0.0.21 langchain-core 0.1.24 langchain-experimental 0.0.52 langchain-openai 0.0.6 langchainhub 0.1.14 azure-search-documents 11.4.0

dosubot[bot] commented 7 months ago

πŸ€–

Based on the error message you're encountering, it seems like the URL you're supplying to the AzureSearch object doesn't have a scheme (like http:// or https://). The URL for the AzureSearch object is fetched from an environment variable using os.getenv("AZURE_SEARCH_ENDPOINT", ""). If this environment variable is not set correctly, or if it doesn't contain a valid URL with a scheme, you'll encounter this error.

To resolve this issue, ensure that the AZURE_SEARCH_ENDPOINT environment variable is set to a valid URL that includes a scheme. For example, it should look something like this: https://my-azure-search-service.search.windows.net.

If you're setting the environment variable in your operating system, make sure to restart your Python environment (or your entire system) to ensure the changes take effect.

If you're setting the environment variable in a .env file, make sure the file is in the correct location and the URL is correctly formatted. The line in the .env file should look something like this:

AZURE_SEARCH_ENDPOINT=https://my-azure-search-service.search.windows.net

After ensuring the environment variable is correctly set, your code for creating the AzureSearch object should work as expected:

vector_store = AzureSearch(
    azure_search_endpoint=os.getenv('AZURE_SEARCH_ENDPOINT'),
    azure_search_key=os.getenv('AZURE_SEARCH_PASSWORD'),
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

In addition to this, I found two similar issues in the LangChain repository that might be relevant to your problem:

  1. regressions with AzureSearch vectorstore update to v11.4.0
  2. Error running vector search in Azure Cognitive Search - The 'value' property of the vector query can't be null or an empty array.

The first issue suggests adding back the import statement for VectorSearch in the azuresearch.py file and updating the code to use vector_search_profile_name="myHnswProfile" instead of vector_search_configuration to specify the algorithm to use when searching the vector field.

The second issue suggests upgrading the version of LangChain being used.

Please try these solutions and let me know if they help resolve your issue.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
khursheed33 commented 6 months ago

I am getting the same error, even without using Azure

My Code: `

def select_example(self,queston:str=None):

    question = "What is the total Assessed Contributions for Polio segement"

    semantic_examples = FewShotExamples.generate_query_few_shots

    prompt_example = PromptTemplate(
        input_variables=["question", "category","query_number","sql_query", "explanation"],
        template= """
        question: {question}
        category: {category}
        query_number: {query_number}
        sql_query: {sql_query}
        explanation: {explanation}
        """
    )

    vector_store = VectorStoreManager()

    example_selector = SemanticSimilarityExampleSelector.from_examples(
       examples= semantic_examples,
       embeddings= vector_store.embedding,
      vectorstore_cls= Chroma,
       k=2,
    )`

Error:

File "d:\project.venv\lib\site-packages\langchain_core\example_selectors\semantic_similarity.py", line 104, in from_examples string_examples = [" ".join(sorted_values(eg)) for eg in examples] File "d:\project.venv\lib\site-packages\langchain_core\example_selectors\semantic_similarity.py", line 104, in string_examples = [" ".join(sorted_values(eg)) for eg in examples] TypeError: sequence item 2: expected str instance, int found

Dependencies:

openai==1.12.0 langchain-community==0.0.22 langchain==0.1.9 langchain_experimental==0.0.52 sentence_transformers==2.3.1 chromadb==0.4.23 langchain-openai==0.0.7

OS: windows

khursheed33 commented 6 months ago

eg

Actually the issue was with the few_shot_examples. There should be only string values but I was having number in the objects.

Before: with numeric values.

`[ {
    "category": "Funding Related",
    "query_number": 1,
    "question": "Provide a breakdown of Flexible Fund for the quarters Q1 and Q2 for Polio.",
}]`

After: I have converted the int value to string.

`[ {
    "category": "Funding Related",
    "query_number": "1",
    "question": "Provide a breakdown of Flexible Fund for the quarters Q1 and Q2 for Polio.",
}]`

Now It's working fine for me.