zkfan / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 0 forks source link

Enable round-robin assignment of shards to channels. #149

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1. To which tool/application/daemon will this feature apply?

Tungsten Replicator. 

2. Describe the feature in general

Tungsten provides a Partitioner interface for assigning shards to channels by 
returning a channel number from 0 to N where N is the maximum available channel 
number.  The default implementation can either assign the number using a hash 
on the shard ID or explicitly thought assignments in the shards.list file. 

Tungsten will be upgraded to support a round-robin assignment policy.  The 
round-robin assignment will work as follows. 

a. At start-up time, the number of available channels will be stored and a 
“next channel” pointer will be created.  

b. Starting with the first new shard ID to appear, new shards will be assigned 
in round-robin fashion starting with channel 0 and incrementing to channel N, 
then wrapping around to channel 0 again.  

c. Shard channel assignments will be stored persistently in the Tungsten 
catalog and it will be possible to list them using a status command. 

d. Shard assignments will be cleared whenever Tungsten replicator goes offline 
cleanly.  This will allow users to change the number of available channels. 

e. Restarting with a different number of channels without going offline cleanly 
will be detected and will cause an error when the pipeline tries to go online. 

3. Describe the feature interface

This feature will be enabled by selecting an appropriate filter.  

4. Give an idea (if applicable) of a possible implementation

This looks like a straightforward extension of the shard management API 
described in issue 102.

5. Describe pros and cons of this feature.

5a. Why the world will be a better place with this feature.

This feature is necessary to enable efficient load balancing of shards.  
Production testing has shown that hash-based approaches do not work, and 
explicit assignment is brittle.

5b. What hardship will the human race have to endure if this feature is
implemented.

The current shard management API may need to be modified to avoid corrupting 
automatic channel assignments when updating shard rules. 

6. Notes

Original issue reported on code.google.com by berkeley...@gmail.com on 3 Jul 2011 at 11:15

GoogleCodeExporter commented 9 years ago

Original comment by berkeley...@gmail.com on 8 Sep 2011 at 5:17

GoogleCodeExporter commented 9 years ago

Original comment by berkeley...@gmail.com on 8 Sep 2011 at 5:18

GoogleCodeExporter commented 9 years ago
Round-robin assignment of shards to channels is implemented using a new catalog 
table called trep_shard_channel that has the structure shown below: 

+----------+--------------+------+-----+---------+-------+
| Field    | Type         | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| shard_id | varchar(128) | NO   | PRI |         |       |
| channel  | int(11)      | YES  |     | NULL    |       |
+----------+--------------+------+-----+---------+-------+

This persistently stores shard-to-channel assignments.  You can enable this 
type of round-robin assignment in the shard.list file by setting the 
round-robin option as shown below: 

# Method for channel hash assignments.  Allowed values are round-robin and 
# string-hash. 
(hash-method)=round-robin

You must take the replicator offline and online for round-robin sharding to 
take effect.  

During operation the current contents of the shard assignment table is 
available using the following new option on trepctl status: 

trepctl status -name channel-assignments

This prints current shard channel assignments.  The assignments are cleared 
whenever the replicator goes offline cleanly.  Specifically, whenever the 
trep_commit_seqno table is reduced to a single row, the shard assignments table 
is also cleared.  

You must enable round-robin channel assignment explicitly.  Otherwise, the 
replicator uses the traditional approach of hashing on the shard ID.  This is 
still the default behavior. 

Original comment by robert.h...@continuent.com on 15 Dec 2011 at 6:27