I describe how current prefix merge policy works based on the observation from
ingestion experiments.
Also, the similar observation was observed by Sattam as well.
The observed behavior seems a bit unexpected, so I post the observation here to
consider better merge policy and/or better lsm index design regarding merge
operations.
The aqls used for the experiment are shown at the end of this writing.
Prefix merge policy decides to merge disk components based on the following
conditions
1. Look at the candidate components for merging in oldest-first order. If one
exists, identify the prefix of the sequence of all such components for which
the sum of their sizes exceeds MaxMergableComponentSize. Schedule a merge of
those components into a new component.
2. If a merge from 1 doesn't happen, see if the set of candidate components
for merging exceeds MaxToleranceComponentCnt. If so, schedule a merge all of
the current candidates into a new single component.
Also, the prefix merge policy doesn't allow concurrent merge operations for a
single index partition.
In other words, if there is a scheduled or an on-going merge operation, even if
the above conditions are met, the merge operation is not scheduled.
Based on this merge policy, the following situation can occur.
Suppose MaxToleranceCompCnt = 5 and 5 disk components were flushed to disk.
When 5th disk component is flushed, the prefix merge policy schedules a merge
operation to merge the 5 components.
During the merge operation is scheduled and starts merging, concurrently
ingested records generates more disk components.
As long as a merge operation is not fast enough to catch up the speed of
generating 5 disk components by incoming ingested records,
the number of disk components increases as time goes.
So, the slower merge operations are, the more disk components there will be as
time goes.
I also attached a result of a command, "ls -alR <directory of the asterixdb
instance for an ingestion experiment>" which was executed after the ingestion
is over.
The attached file shows that for primary index (whose directory is
FsqCheckinTweet_idx_FsqCheckinTweet), ingestion generated 20 disk components,
where each disk component consists of btree (the filename has suffix _b) and
bloom filter (the filename has suffix_f) and MaxMergableComponentSize is set to
1GB.
It also shows that for the secondary index (whose directory is
FsqCheckinTweet_idx_sifCheckinCoordinate), ingestion generated more than 1400
components, where each disk component consist of a dictionary btree (suffix:
_b), an inverted list (suffix: _i), a deleted-key btree (suffix: _d), and a
bloom filter for the deleted-key btree (suffix: _f).
Even if the ingestion was over, since our merge operation happens
asynchronously, the merge operation continues and eventually merge all mergable
disk components according to the describe merge policy.
------------------------------------------
AQLs for the ingestion experiment
------------------------------------------
drop dataverse STBench if exists;
create dataverse STBench;
use dataverse STBench;
create type FsqCheckinTweetType as closed {
id: int64,
user_id: int64,
user_followers_count: int64,
text: string,
datetime: datetime,
coordinates: point,
url: string?
}
create dataset FsqCheckinTweet (FsqCheckinTweetType) primary key id
/* this index type is only available kisskys/hilbertbtree branch. however, you
can easily replace sif index to inverted keyword index on the text field and
you will see similar behavior */
create index sifCoordinate on FsqCheckinTweet(coordinates) type sif(-180.0,
-90.0, 180.0, 90.0);
/* create feed */
create feed TweetFeed
using file_feed
(("fs"="localfs"),
("path"="127.0.0.1:////Users/kisskys/Data/SynFsqCheckinTweet.adm"),("format"="ad
m"),("type-name"="FsqCheckinTweetType"),("tuple-interval"="0"));
/* connect feed */
use dataverse STBench;
set wait-for-completion-feed "true";
connect feed TweetFeed to dataset FsqCheckinTweet;
Original issue reported on code.google.com by kiss...@gmail.com on 15 Apr 2015 at 9:33
Original issue reported on code.google.com by
kiss...@gmail.com
on 15 Apr 2015 at 9:33Attachments: