weaviate / weaviate-python-client

A python native client for easy interaction with a Weaviate instance.
https://weaviate.io/developers/weaviate/current/client-libraries/python.html
BSD 3-Clause "New" or "Revised" License
166 stars 79 forks source link

Vector is empty even after adding embeddings to the object being added #1173

Closed omarsinno-oreyeon closed 4 months ago

omarsinno-oreyeon commented 4 months ago

Hello I'm working on a book information retrieval system. I have books as PDFs and I am using Ollama to generate embeddings and weaviate as vector store. In the vector store I made a multi-tenant collection called Books and each book is a tenant.

For brevity I will only use one tenant, which is Shakespeare's Othello as PDF.

Below I present the two functions I am using to create the vector store and to populate it. You can see that the vector store is being populated with the text as it should, however the vector itself is empty even when I set it in the batch.add_object method. You can check the example output at the end.

Here are the two functions I use:

def create_vector_store(client, collection_name: str, tenant_name: str):
    '''
    Creates the vector store using Weaviate.
    Uses multi-tenancy with each book as a tenant.

    Args:
        client (WeaviateClient): Weaviate client
        collection_name (str): Name of the collection in which store vectors.
        tenant_name (str): Independent allocation in the collection to each book.

    Return:
        books_collection
        books_tenant
    '''
    try:

        books_collection = client.collections.create(
            name = collection_name,
            multi_tenancy_config = wvc.config.Configure.multi_tenancy(
                enabled = True,
                auto_tenant_creation = True,
                auto_tenant_activation = True
            ) 
        )

        books_collection.tenants.create(tenants = [wvc.tenants.Tenant(name=tenant_name)])
        books_tenant = books_collection.with_tenant(tenant_name)
        print('[i] Created collection with tenant.')

    except:

        books_collection = client.collections.get(collection_name)
        books_tenant = books_collection.with_tenant(tenant_name)
        print('[i] Fetched collection with tenant.')

    return books_collection, books_tenant

def populate_vector_store(collection, documents: List[Document], tenant_name: str):
    '''
    Popular vector store with objects.

    Args:
        collection
        documents
        tenant_name

    '''
    with collection.batch.dynamic() as batch:
        for d, document in enumerate(documents):
            response = ollama.embeddings(
                model = 'mxbai-embed-large',
                prompt = f'''
                        Represent this text for information retrieval
                        from the book passage:
                        {document}
                '''
            )

            a = batch.add_object(
                properties = {'text': str(document)},
                vector = response['embedding']
            )

And I'm calling it as follows:

client = weaviate.connect_to_local()

collection_name = 'Books'
tenant_name = 'Othello'

books_collection, books_tenant = create_vector_store(client, collection_name, tenant_name)
populate_vector_store(books_tenant, cleaned_documents, tenant_name)

result = books_tenant.query.fetch_objects(
    limit=2
)

An example output:

QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('0131ef1c-eb3a-4d40-91f8-751de84ffc11'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'text': 'OTHELLO How comes it, Michael, you are thus forgot? CASSIO I pray you pardon me; I cannot speak. OTHELLO Worthy Montano, you were wont be civil. The gravity and stillness of your youth The world hath noted. And your name is great In mouths of wisest censure. What’s the matter That you unlace your reputation thus, And spend your rich opinion for the name Of a night-brawler? Give me answer to it. MONTANO Worthy Othello, I am hurt to danger. Your officer Iago can inform you, While I spare speech, which something now offends me, Of all that I do know; nor know I aught By me that’s said or done amiss this night, Unless self-charity be sometimes a vice, And to defend ourselves it be a sin When violence assails us. OTHELLO Now, by heaven, My blood begins my safer guides to rule, And passion, having my best judgment collied, Assays to lead the way. Zounds, if I stir, Or do but lift this arm, the best of you Shall sink in my rebuke. Give me to know How this foul rout began, who set it on; And he that is approved in this offense, Though he had twinned with me, both at a birth, Shall lose me. What, in a town of war Yet wild, the people’s hearts brimful of fear, To manage private and domestic quarrel, In night, and on the court and guard of safety? ’Tis monstrous. Iago, who began ’t?'}, references=None, vector={}, collection='Books')]
dirkkul commented 4 months ago

you need to do collection.query.fetch_objects(include_vector=True)

It would be great if you could use our support forum for these kind of problems, thanks!