Closed meet1919 closed 1 year ago
π€
Thank you for raising this issue. I understand that you're looking for a way to perform case-insensitive searches using the $contains
operator in the get()
method.
Currently, the LangChain framework does not support case-insensitive searches out of the box. However, I'm considering your suggestion for a $regex
operator or similar functionality to handle this.
To better understand your needs and to propose a suitable solution, could you please provide more details about your use case? Specifically, it would be helpful to know:
$contains
queries case-insensitive, or do you need to switch between case-sensitive and case-insensitive searches?Your feedback will be valuable in enhancing the functionality of LangChain. Looking forward to your response.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot About the feedback you asked: To better understand your needs and to propose a suitable solution, could you please provide more details about your use case? Specifically, it would be helpful to know:
I want switch between $regex and $contains if there is a trade-off in performance when using $regex. If there is no compromise in the search speed $contains can search for all the case-insensitive queries by changing global settings.
Types of Queries can be, "OpenCV", "NumPy", "FedEx", "iPhone", "WordPad", "JavaScript", "LinkedIn", "WordPress", "YouTube", etc.
π€
Hi there,
Thanks for reaching out with your question. To better assist you, I need a bit more information about your current setup and usage of the LangChain framework.
Could you please provide more details about the data structure you're working with? Specifically, what are the fields in your documents and what types of data do they contain?
Are you currently using any specific indexing or search optimization techniques in your application?
Could you provide an example of how you're currently using the $contains
operator in your get()
method calls?
Are there any constraints or requirements that we should be aware of when considering modifications to the get()
method or the framework?
The more context you can provide, the better I can assist you in finding a solution that fits your needs.
Best, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot for the questions you asked as below:
Thanks for reaching out with your question. To better assist you, I need a bit more information about your current setup and usage of the LangChain framework.
1. Could you please provide more details about the data structure you're working with? Specifically, what are the fields in your documents and what types of data do they contain?
2. Are you currently using any specific indexing or search optimization techniques in your application?
3. Could you provide an example of how you're currently using the $contains operator in your get() method calls?
4. Are there any constraints or requirements that we should be aware of when considering modifications to the get() method or the framework?
The more context you can provide, the better I can assist you in finding a solution that fits your needs.
Answers:
get()
method as follow: vector_store.get(where_document={"$contains": "langchain"}))
π€
Hello,
To perform case-insensitive searches using the $contains
operator or implement the $regex
operator, you would need to modify the RedisText
class in the filters.py
file.
Currently, the RedisText
class supports the ==
, !=
, and %
(LIKE) operators. To add support for case-insensitive searches and regex, you would need to add new methods to this class.
However, it's important to note that Redis itself does not natively support case-insensitive searches or regex in its query language. Therefore, to implement these features, you would need to either preprocess your data to store it in a case-insensitive manner or use a Redis module like RediSearch that extends the capabilities of Redis.
Here's an example of how you might add a case-insensitive search feature by preprocessing your data:
class RedisText(RedisFilterField):
# ...
@check_operator_misuse
def __eq__(self, other: str, case_insensitive: bool = False) -> "RedisFilterExpression":
"""Create a RedisText equality filter expression
Args:
other (str): The text value to filter on.
case_insensitive (bool): Whether to perform a case-insensitive search.
Example:
>>> from langchain.vectorstores.redis import RedisText
>>> filter = RedisText("job") == "engineer", case_insensitive=True
"""
if case_insensitive:
self._field = self._field.lower()
other = other.lower()
self._set_value(other, str, RedisFilterOperator.EQ)
return RedisFilterExpression(str(self))
In this example, if case_insensitive
is True
, the field name and the search value are both converted to lowercase before the search is performed. This means that you would need to store your data in lowercase to make this work.
As for regex, you would need to use a Redis module like RediSearch, which supports regex as part of its query language. If you decide to use RediSearch, you would need to modify your code to use the RediSearch query language instead of the standard Redis query language.
Please note that these changes would require modifications to the LangChain framework and should be thoroughly tested before being used in a production environment.
I hope this helps! Let me know if you have any other questions.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot I am using Chromadb, Redis is irrelevant for me.
π€
Hi there,
Thanks for reaching out. I understand you're looking to implement case-insensitive searches in your application using the LangChain framework with ChromaDB.
To better assist you, I need a bit more information about your current setup. Could you please provide the following details:
This information will help me understand your situation better and provide a more accurate solution.
Looking forward to your response.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot
Name: langchain Version: 0.0.305
Name: chromadb Version: 0.4.6
No other library in conjunction with Langchain and Chromadb
There is no such specific document size, total number of documents and frequency of queries. You can assume the maximum values possible.
π€
Hi there,
Thanks for reaching out and providing a detailed explanation of your issue. I understand that you're trying to implement case-insensitive searches in your application using the LangChain framework and ChromaDB.
To help you better, I need a bit more information about your current setup. Specifically, I'd like to know:
Your answers to these questions will help me understand your situation better and provide a more accurate solution. Looking forward to your response.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot you know what, what I want to do is very simple, but you keep on asking many questions. I am done with it.
π€
Hi there,
Thanks for reaching out. To help you better, I need a bit more information about your setup.
Could you please provide more details about the structure of your documents? Specifically, I'm interested in knowing the fields that you're trying to search within.
Are you open to using other methods or operators for querying, or do you specifically want to modify the $contains
operator or use the $regex
operator?
Could you also provide some examples of the queries you're trying to perform? This will help me understand the complexity and requirements of your search functionality.
Lastly, could you please confirm if you're using the latest version of LangChain (v0.0.311)? If not, are you open to updating to the latest version?
Looking forward to your response.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Hey @meet1919 Sorry Dosu couldn't help you more. If Dosu get "stuck" asking clarifying questions, it means they aren't able to find anything with the repository to concretely answer your question.
We're working on improving this! As for your question, WhereDocument
type comes from ChromaDB, so I'd go their Discord or Github to find an answer.
Let me know if you have any other ideas on ways we can make Dosu more useful!
Great, okay I see about the WhereDocument
type.
Are there any other db that supports more filtering options than ChromaDB?
About Dosu-bot, does it search with internet and bringing the information into its context while generating answer? Along with information from the source code in the context too. Follow-up questions are asked properly but is there a way to converge the planning process rather than keeping on asking questions?
π€
Hello,
Thank you for reaching out and providing a detailed explanation of your issue.
Based on the context provided, the LangChain framework (version 0.0.305) does not support the $regex
operator for case-insensitive searches in the get()
method of the Collection
class. The $regex
operator does not exist in the LangChain framework (version 0.0.305). The framework uses Python's built-in re
module for regex matching, but this is not exposed as a $regex
operator for use in the get()
method.
If you want to implement case-insensitive searches, you might need to implement a custom solution. One possible approach could be to convert both the search term and the document content to lowercase before performing the $contains
operation. This would effectively make the search case-insensitive. However, this might not be the most efficient solution, especially for large documents.
As for your question about other databases that support more filtering options than ChromaDB, I would need more information about your specific requirements to provide a suitable recommendation. There are many databases available that offer advanced filtering options, but the best choice would depend on your specific use case and requirements.
Regarding your question about Dosu-bot, it does not search the internet for information. It generates responses based on the context provided by the user and the information available in the source code. It does not have the ability to bring in external information or context. The follow-up questions are part of the bot's design to ensure it understands the user's request accurately and provides the most relevant response.
I hope this answers your questions. If you need further clarification or have more questions, feel free to ask.
Best regards, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@meet1919 I'm not familiar with all VectorDBs, but I use Supabase/Postgres with pgvector, which you can definitely do insensitive queries using SQL.
About Dosu-bot, does it search with internet and bringing the information into its context while generating answer? Along with information from the source code in the context too. Follow-up questions are asked properly but is there a way to converge the planning process rather than keeping on asking questions?
Not yet. Internet information can be noisy and not always relevant to the issue, which is why we haven't incorporated it yet. For converging the planning process, yes this is actively being worked on so stay tuned.
Issue you'd like to raise.
To get documents from collections (vector db) there is a method:
To query documents by searching for a particular term in the document
where_document = {"$contains": "langchain"}
can be passed to theget()
method. This value for the key/operator$contains
is case sensitive. How to search for keywords irrespective of case?Suggestion:
I want to extract entities from a sentence and pass it get documents from the chromadb that contains that words. But this entities need to be case sensitive. If the LLM outputs a different case like for example, the document contains the keyword: "Langchain", but asking llm to extract the entity from the sentence it isnt always sure that it will generate "Langchain", it can output: "langchain". This can be handled if the first letter of the entities generated is capitalized. But this may not work for keywords like OpenCV.