johnlpage / POCDriver

Workload Driver for MongoDB in Java
Other
204 stars 87 forks source link

Add a way to specify a custom shard key #33

Closed niccottrell closed 5 years ago

niccottrell commented 5 years ago

The current _id is not a great shard key, but then other options like the default fld0 isn't great either. Based on testing by @josefahmad the values don't seem to be properly random.

josefahmad commented 5 years ago

Attaching the plot_split_distribution output for _id: 1 as the shard key (default), and for fld0: 1respectively. Looks like fld0 is not truly random.

shard_on_id shard_on_fld0

I think we can pick a default shard key with better distribution.

johnlpage commented 5 years ago

@josefahmad - the shard key in POCdriver is explicitly designed and chosen to be optimal - unlike a random shard key which is inherently bad it is supposed to be monotonically increasing from a low cardinality set of seed points. POCDriver was actually designed initially to demonstrate this principle. The shard key is optimal for writing and also supporting the internal mechanisms of POCDriver.