qdrant / qdrant-haystack

An integration of Qdrant ANN vector database backend with Haystack
Apache License 2.0
43 stars 12 forks source link

How to pass metadata to QdrantClient #26

Closed MathiasSpanhove closed 1 year ago

MathiasSpanhove commented 1 year ago

I see that due to #19 and #20 that **kwargs has been removed from the QdrantDocumentStore __init__. Is there a workaround to pass metadata to the QdrantClient?

Currently the following code doesn't work:

document_store = (
    QdrantDocumentStore(
        url=f"{QDRANT_APP_URL}",
        port=443,
        index=f"{QDRANT_INDEX}",
        embedding_dim=768,
        recreate_index=True,
        timeout=120,
        similarity="cosine",
        metadata={"Authorization": f"Bearer {get_user_token()}"}
    )
)
kacperlukawski commented 1 year ago

@MathiasSpanhove It might not be the most elegant solution to that, but could you please verify if that works?

document_store._client._rest_headers = {"Authorization": f"Bearer {get_user_token()}"}

It seems to be an ugly workaround, but I can't see a different way.

MathiasSpanhove commented 1 year ago

Thank you for your answer @kacperlukawski

I also tried something like what you suggested. However since the QdrantDocumentStore uses the client when initializing, that call doesn't yet have the headers resulting in an unauthorized exception.

The reason why is the following:

# Make sure the collection is properly set up
self._set_up_collection(index, embedding_dim, recreate_index, similarity)

Are you open to a PR that adds additional_headers: Optional[dict] = None to the QdrantDocumentStore __init__?

MathiasSpanhove commented 1 year ago

I could also create a PR with something like the following:

@classmethod
def from_qdrant_client(cls, client: qdrant_client.QdrantClient, **kwargs):
    cls.client = client
    return cls(**kwargs)

And add a check in the __init__ to make sure that the client hasn't already been provided? This way all current implementations would keep working.

Let me know what you think and if you would like a PR!

kacperlukawski commented 1 year ago

I think we can expose the metadata parameter, as long as it doesn't break the YAML pipelines. Would you like to contribute?

MathiasSpanhove commented 1 year ago

I think we can expose the metadata parameter, as long as it doesn't break the YAML pipelines. Would you like to contribute?

I've made a PR that exposes the metadata parameter and doesn't break the YAML pipeline.