hkulekci / qdrant-php

Qdrant is a vector similarity engine & vector database. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!
MIT License
93 stars 21 forks source link

Support for multiple vectors in VectorStruct #13

Closed gregpriday closed 1 year ago

gregpriday commented 1 year ago

Hey @hkulekci,

First up, I've gotta say your library is seriously useful – it's saved me loads of time. So thanks for that!

I've been using it for my Laravel Scout engine and noticed something. Correct me if I'm wrong, but it doesn't look like it has support for multiple vectors, even though Qdrant's supports that?

We could tweak the VectorStruct constructor to take an array or a string for $name. If it gets an array, it knows it's dealing with multiple vectors. If it's a string, then it's just a single vector. This maintains backward compat.

Here's a bit of code to show what I mean:

class VectorStruct
{
    protected array $vectors;

    public function __construct(array $vector, $name = null)
    {
        // Check if $name is an array. If true, it means we have multiple vectors.
        // Otherwise, it's a single vector.
        if (is_array($name)) {
            // Multiple vectors. $vector is now an array of vector names, and $name is an array of vectors.
            $this->vectors = array_combine($vector, $name);
        } else {
            // Single vector.
            $this->vectors = [$name => $vector];
        }
    }

    public function toSearch(): array
    {
        $search = [];
        foreach ($this->vectors as $name => $vector) {
            $search[] = [
                'name' => $name,
                'vector' => $vector,
            ];
        }
        return $search;
    }

    public function toArray(): array
    {
        return $this->vectors;
    }
}

If you think it's a good fit, I'd be happy to throw together a PR. Or if I've got it all wrong and missed how to do this the official way, a nudge in the right direction would be awesome.

Thanks for considering!

Greg

hkulekci commented 1 year ago

Hello @gregpriday,

I'm glad to know you're like the repository. Feel free to submit a PR. Just to let you know, I'm also utilizing this library in conjunction with Laravel. However, I wasn't aware that Scout offered support for Vectors. Maybe we can harness this library to extend support to both Qdrant and Scout.

As for VectorStruct, the underlying concept is to create a vector and assign it a name - nothing more complex. Perhaps we can establish another Model as VectorsStruct. We don't need to burden a single class with too many responsibilities. However, my concern is that Qdrant doesn't natively support the upload of vectors in batches. It only allows batch point uploads. Hence, as a client, we provide support for batch points. (https://github.com/hkulekci/qdrant-php/blob/main/src/Models/PointsStruct.php) There's even a convenient static method to create directly from an array. (PointsStruct::createFromArray()).

$points = PointsStruct::createFromArray([
    [
        'id' => 1,
        'vector' => [
            'image' => [1, 2, 3]
        ],
    ],
    [
        'id' => 1,
        'vector' => [
            'image' => [3, 4, 5]
        ],
    ]
]);

I haven't been able to fully review your approach yet. But I'll be sure to do so as soon as I can. Meanwhile, let's explore what we can do collaboratively to improve the implementation.

gregpriday commented 1 year ago

My Scout implementation is still quite early, but I'm putting in a lot of work here - https://github.com/gregpriday/laravel-scout-qdrant - it feels magical having vector search directly in Scout, and it's very cost-effective to just spin up a Forge server, install Qdrant via Docker and have vector search support for free.

My plan is to also create a simple vectorizer Docker container that uses Hugging Face's sentence transformers, which really makes the whole thing self-contained.

I'll work on multi-vector support given your feedback and submit a PR once everything is stable - https://github.com/gregpriday/qdrant-php/tree/feature/multiple-vector-support

I need this for a current project so I'll be working on this quite a lot in the coming weeks. Looking forward to collaborating!

hkulekci commented 1 year ago

Sounds good. Whenever you're prepared to create a PR, go ahead and submit it, and we can discuss it further. Thanks.