qdrant / qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
https://qdrant.tech
Apache License 2.0
19.78k stars 1.34k forks source link

Non-blocking payload index construction #4934

Closed Vasniktel closed 2 weeks ago

Vasniktel commented 3 weeks ago

Is your feature request related to a problem? Please describe. I have quite a large collection: 60M points and rising. Sometimes it is necessary to add new payload indices to the collection or update existing ones. Currently, each such request effectively freezes the database (both read and write queries are blocked) which causes significant downtime.

Describe the solution you'd like Payload index modification happens completely in the background. It should be possible to upsert and read data from the DB while the index is being constructed. Ideally, there are also performance benefits from an incomplete index.

Describe alternatives you've considered I haven't tested it but would it help to have several shards in the collection? Currently, there is only one shard. Other workarounds would also be appreciated.

coszio commented 3 weeks ago

Hi @Vasniktel, thanks for the report!

I could reproduce this pretty easily with a 10M collection (pretty sure a smaller would do too) on both latest and dev tags

$ bfb -n 10000000 --keywords 100 --collection-name repro --indexing-threshold 1000000000000 --dim 2

// On dashboard
DELETE /collections/repro/index/a

$ bfb --skip-upload --skip-create --skip-wait-index --search --collection-name repro  -d 2

// On dashboard
PUT /collections/repro/index?wait=true
{
    "field_name": "a",
    "field_schema": "keyword"
}

When creating it, all bfb searches will freeze and even timeout. It shouldn't behave like this

generall commented 3 weeks ago

Payload index is expected to be constructed in advance, so index construction during normal workload is not really expected.

agourlay commented 2 weeks ago

@Vasniktel Fixed in 1.11.1