vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.58k stars 586 forks source link

Question regarding tradeoff between grouped and flat distribution in vespa. #22789

Closed 107dipan closed 2 years ago

107dipan commented 2 years ago

Hi Vespa Team,

We were evaluating the tradeoffs between Grouped vs Flat distribution in vespa and wanted to confirm out analysis with. For grouped config, our cluster has 3 groups with 6 content node in each group. The redundancy and searchable copy count is 3. For flat distribution, our cluster redundancy and searchable copy is both set to 3.

Question regarding Concurrency : Vespa will create a fixed amount of threads(per search thread setting) for serving any search queries. In a flat distribution the search query will be dispatched to every content node hence the number of threads remaining to serve a new request will be lower than the number of threads available to serve new requests in group distribution since query will be dispatched to a single group and more number of threads available to handle new requests. Is this analysis correct?

Question regarding Resiliency : One node down -> When only one node is down in vespa cluster, request will not be dispatched to the group since all docs are not yet available in that group hence we will serve less concurrent requests since none of the nodes in a group are serving requests. When compared against flat distribution the unhealthy node will be the only one not serving any request. Is our understanding how vespa will handle this scenario correct?

Question regarding Resiliency : Multiple node down -> Let us consider a scenario where we have atleast one node down each of the groups. None of the nodes are sharing the same bucket i.e there is atleast one copy of a document on a healthy node. Can we say that since in group it will send the request to a particular group we will always see lower coverage whereas in a flat distribution since request will be sent to each healthy node which will have searchable copy for the docs the coverage will be 100% in this case.

kkraune commented 2 years ago

Hi,

Thanks for a lot of great questions. Please review a document written for Vespa Cloud that addresses some of your questions at https://cloud.vespa.ai/en/topology - then see if there are any open questions after reviewing it.

Note that Vespa Cloud's redundancy config is a little different, but the basic concepts apply

bratseth commented 2 years ago

Concurrency: Yes. Resiliency 1: Yes, correct when redundancy=number of groups so there<s no internal redundancy inside the group. Resiliency 2: Yes, as long as nodes downed at the same time < redundancy.

nehajatav commented 2 years ago

@bratseth But isn't the work load per node lesser in flat distribution as the data sets will be spread out in more number of nodes and hence fewer data is served per node (since only one active copy of data is used during search)? Even if DQW is significant in comparison to SQW, does group distribution result in higher throughput or similar throughput to flat?

In a flat distribution the search query will be dispatched to every content node hence the number of threads remaining to serve a new request will be lower than the number of threads available to serve new requests in group distribution

bratseth commented 2 years ago

There's no simple answer.

The cost of a query on each content node = fixed cost per query + cost varying with the number of docs matched.

Using grouped distribution divides the fixed cost by the number of groups, so when that dominates the total cost, because the queries are large, or the document count is low, the gain is significant.

However, the cost per document is not necessarily linear - depending on the query operators used, the incremental cost per query from additional documents may be very small, which means more groups may increase total cost.

In addition, with groups > 1 you need one redundant group for availability, while with groups=1 you need one redundant node.