[Enhancement] Support for specifying vectorizer parameter

langchain-ai / langchain-weaviate

MIT License

35 stars 15 forks source link

[Enhancement] Support for specifying vectorizer parameter #179

Closed pashva closed 3 days ago

pashva commented 6 months ago

feat: https://github.com/langchain-ai/langchain-weaviate/issues/95

I have added a optional parameter of vectorizer to the Vectorstore class The hierarchy it follows is :-

If a vectorizer is passed for example text2vec-transformers, the class will be created with the passed value.
If nothing is passed to it then it will adopt the DEFAULT_VECTORIZER from the Weaviate configuration
If the user does not want any vectorizers they can pass none as a string in the vectorizer param as well

hsm207 commented 6 months ago

@pashva thanks for your contribution!

Unfortunately, I can't accept it in the current state because:

As discussed in #95, the solution we're looking to implement is, instead of introducing another param in the init method, users will create their schema in weaviate themselves, and then when instantiating the weaviate vector store class in langchain, specify the index name.
The CI pipeline is failing as there aren't tests
This feature definitely will need a documentation update

We have a contributing doc to explain how to set up a dev environment to run tests, lint, etc.

pashva commented 6 months ago

So if I understand it correctly, In the init class you want to check whether the index exists previously with some configuration else go ahead and create a default one? But this seems to be the current implementation right? Where we only create a collection if it does not exist?@hsm207

hsm207 commented 6 months ago

@pashva yes, your understanding is correct.

Although the current implementation in the init method is implemented the way want, it was not written with this feature in mind. The purpose was to not overwrite the schema that was created within langchain when the user provides the index_name to reuse an index langchain created earlier, as weaviate will throw an error if the new schema name already exists.

Therefore, we should verify that the use case we are currently considering does work as expected and not break things elsewhere e.g. when doing search or ingesting documents.

dudanogueira commented 3 days ago

Ok, closing this as we defined that:

When one wants to select a vectorizer for you Weaviate Langchain VectorStore, they should create the collection and define it before using Langchain.

I have written a recipe here that demo this: https://github.com/weaviate/recipes/tree/main/integrations/llm-frameworks/langchain/loading-data

thanks @pashva for your contribution!!