I did the same experiment as described in issue 868 except the merge policy of
the dataset is correlated prefix merge policy. (the experiment in issue 868
uses default merge policy, i.e., prefix merge policy.)
The correlated prefix merge policy will only look at primary indexes in order
to evaluate if a merge operation is needed. If it decides that a merge
operation is needed, then it will merge *all* the indexes that belong to the
dataset. The criteria to decide if a merge is needed is the same as the one
that is used in the prefix merge policy:
1. Look at the candidate components for merging in oldest-first order. If one
exists, identify the prefix of the sequence of all such components for which
the sum of their sizes exceeds MaxMrgCompSz. Schedule a merge of those
components into a new component.
2. If a merge from 1 doesn't happen, see if the set of candidate components for
merging exceeds MaxTolCompCnt. If so, schedule a merge all of the current
candidates into a new single component.
According to the policy, the similar behavior of the prefix merge policy
explained in issue 868 may occur for the correlated merge policy as well. That
is, as time goes, the number of secondary index components will increase.
Also, one important difference between the prefix one and correlated prefix one
is that the current implementation of the correlated merge policy allows
concurrent merge operations in secondary indexes (but not in primary index). In
addition, the order of the merge operations are not enforced across concurrent
merge operations. This may cause a problem described below.
Suppose a situation where 5 disk components from sdc1 to sdc5 are merged into
sdc5-1 and concurrently sdc6 through sdc10 are merged into sdc10-6. If the
merge sdc10-6 is completed first and still the merge sdc5-1 is going on, when
the next merge is scheduled by more flushed disk components, say, sdc11 to sdc
14, sdc10-6 will be included in the merge operation with sdc11 ~ sdc14
components. This will cause a problem since so far our merge operation must
merge only consecutive disk components without making any holes. The above
situation will leave a hole for the merging component sdc10-6. (Please correct
me if this explanation is wrong.)
Also, current implementation of the correlated merge policy decides the number
of components to be merged by picking a minimum number of disk components of
all indexes in the dataset. Because of this, at the end of the ingestion, many
disk components in secondary indexes end up being not merged. This situation
was observed for RTree secondary index as well.
Original issue reported on code.google.com by kiss...@gmail.com on 15 Apr 2015 at 10:24
Original issue reported on code.google.com by
kiss...@gmail.com
on 15 Apr 2015 at 10:24