qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
183 stars 74 forks source link

Implement consistent hashing logic for Hadoop cluster that takes care of the lost nodes or downscaling of a cluster #162

Closed abhishekdas99 closed 6 years ago

abhishekdas99 commented 6 years ago
Node Index table        File Index Table        File-Node Membership
                (Returned by hashing logic)     
A   0           F1  0           F1  A
B   1           F2  1           F2  B
C   2           F3  2           F3  C
D   3           F4  3           F4  D

If we lose Node A and cluster adds a new node E, there will be 4 valid nodes in the cluster. In the current logic in clusterManager, the getNodes call will return a list containing {B, C, D, E} as A is lost.

So the tables look like

Node Index table        File Index Table        File-Node Membership
                (Returned by hashing logic)     
B   0           F1  0           F1  B
C   1           F2  1           F2  C
D   2           F3  2           F3  D
E   3           F4  3           F4  E

In this case for all the files, file-node memberships are going to change which defeats the purpose of consistent hashing