Open ohbeep opened 5 months ago
🤖
Hey @ohbeep! I'm Dosu, a friendly bot here to assist while we wait for a human maintainer. I can help you solve bugs, answer questions, and guide you through contributing. Let's make LangChain even better together!
Thank you for your suggestion, ohbeep. Adding support for the Dynamic Schema feature of the Milvus vector database to the LangChain framework could indeed provide more flexibility and efficiency in handling different types of data.
Your proposed changes to the __init__
and from_texts
methods of the Milvus class seem to be a good starting point. However, to fully implement this feature, we would also need to modify the way the schema is determined and the collection is created in these methods.
Currently, the schema for the Milvus collection is determined based on the first data being inserted into the collection. If the dynamic_schema
flag is set to True, we would need to determine the schema based on all the data being inserted, not just the first data. This could be done in the from_texts
method, where the data is being inserted into the collection.
Here is a rough example of how you could modify the from_texts
method to support dynamic schemas:
@classmethod
def from_texts(
cls,
texts: List[str],
embedding: Embeddings,
metadatas: Optional[List[dict]] = None,
collection_name: str = "LangChainCollection",
connection_args: dict[str, Any] = DEFAULT_MILVUS_CONNECTION,
consistency_level: str = "Session",
index_params: Optional[dict] = None,
search_params: Optional[dict] = None,
drop_old: bool = False,
dynamic_schema: bool = False, # new flag for dynamic schema
**kwargs: Any,
) -> Milvus:
"""Create a Milvus collection, indexes it with HNSW, and insert data.
If dynamic_schema is True, the schema of the collection will be determined
based on the data being inserted.
"""
# existing code...
if dynamic_schema:
# determine schema based on data and create collection
schema = determine_schema(texts, metadatas)
collection = pymilvus.Collection(name=collection_name, schema=schema)
else:
# existing code to create collection...
In this example, determine_schema
would be a function that determines the schema based on the data. You would need to implement this function based on the specific requirements of your application and the Dynamic Schema feature of Milvus.
Please note that this is a rough example and may need to be adjusted based on the specific requirements of your application and the details of the Dynamic Schema feature of Milvus.
I look forward to your PR. If you have any questions or need further clarification, feel free to ask.
Best regards, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@ohbeep What is the status on this? I need this for my Milvus collections and would rather not fork.
I have opened a PR here: https://github.com/langchain-ai/langchain/pull/21172. Hopefully, it gets merged soon.
Feature request
I suggest supporting the Milvus vector database's new Dynamic Schema feature.
Motivation
According to Milvus:
I think it is good to allow Langchain to have this feature when multiple types or schema of documents are added to the database.
Your contribution
I propose to add a "dynamic_schema" flag to the
__init__
andfrom_texts
method of the Milvus class:__init__
method:https://github.com/langchain-ai/langchain/blob/4c47f39fcb539fdeff6dd6d9b1f483cd9a1af69b/libs/community/langchain_community/vectorstores/milvus.py#L107-L125
Change to:
from_texts
method:https://github.com/langchain-ai/langchain/blob/4c47f39fcb539fdeff6dd6d9b1f483cd9a1af69b/libs/community/langchain_community/vectorstores/milvus.py#L839-L887
Change to:
I may later submit a PR for this suggestion.