vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.74k stars 597 forks source link

Partitioning parent-child documents by parent id to avoid full replication of parent documents #32608

Open vslaykovsky opened 4 days ago

vslaykovsky commented 4 days ago

Is your feature request related to a problem? Please describe. I have many documents (200M+) with complex hierarchical structure. It cannot be expressed with nested structures like array due to limitations of indexing support. Also it doesn't make sense to search using all fields of these structures. So natural way to decouple would be to use parent-child relationship. Unfortunately, parent documents are always copied globally which means huge overhead on content nodes.

Describe the solution you'd like Some way of describing sharding logic where parent IDs are used as keys. This lets us not copy all parent documents to all content nodes.

Describe alternatives you've considered Considered all alternatives available today, each of them has certain limitations:

kkraune commented 3 days ago

Hi, and thanks for the feature request!

Yes, the copy-parent-to-all-nodes feature potentially requires much memory, particularly in clusters with many nodes.

What you are describing is a kind of distributed join feature. https://github.com/vespa-engine/sample-apps/tree/master/examples/joins can be relevant as a way to solve problems where the parent/child feature cannot be used

For reference for others: https://docs.vespa.ai/en/parent-child.html

bratseth commented 3 days ago

This has been considered before and does make sense. We need to use some variant of document id's with groups and distribute the same groups to the same nodes across those document types. I think the main issue with it is that we'll easily end up with badly balanced clusters.

vslaykovsky commented 3 days ago

This has been considered before and does make sense. We need to use some variant of document id's with groups and distribute the same groups to the same nodes across those document types. I think the main issue with it is that we'll easily end up with badly balanced clusters.

Can be done in theory with consistent hashing. Zookeeper is already used in vespa anyway. Alternatively this concern can be shifted to the user. E.g. "you've been warned, this grouping is supposed to be used with well balanced data"

bratseth commented 3 days ago

Yes, we do use something similar to consistent hashing in Vespa (the CRUSH algorithm), but here we need to distribute each group to a limited set of nodes to avoid needing to place all global documents on all nodes, while we have no control over the size of each group. I'm not sure how well this can be solved but it for sure adds new complexity to the balancing problem.