metarhia / globalstorage

Distributed Data Warehouse 🌍
https://metarhia.com
MIT License
60 stars 5 forks source link

Implement gs.findServer(objectId) #8

Open tshemsedinov opened 7 years ago

tshemsedinov commented 7 years ago

Infrastructure:

{
  0: {
    0: { ip, port },
    1: { ip, port }
  }
  1: {
    0: {
      0: { ip, port },
      1: { ip, port }
    },
    1: { ip, port }
  }
}

FYI: ObjectId specification https://github.com/metarhia/GlobalStorage/issues/6

tshemsedinov commented 7 years ago

We can convert tree to array: [S0, S1, S2, S3, S0, S5, S2, S3] where Sn are server info objects, something like { host, port }. So we can implement gs.findServer(objectId) simply:

function findServer(objectId) {
  let bits = Math.round(Math.log(servers.length) / Math.log(2));
  let mask = Math.pow(2, bits) - 1;
  return objectId & mask;
}

Example:

let objectId = 0xA0527C93;
let serverId = findServer(objectId);
// serverId: 3
let server = servers[serverId];
// server: S3
DzyubSpirit commented 7 years ago

But if we have unfull tree structure then bits must be calculated in different way.

{
  0: {
    0: {
      0: { host, port }, 
      1: { host, port }
    },
    1: { host, port }
  },
  1: { host, port }
}

We have four servers but we have to use 3 bits.

Also maybe array representation is better because of shorter length?

[[[{host,port},{host,port}],{host,port}],{host,port}]

or

[ 
  [ 
    [ 
      { host, port },
      { host, port }
    ],
    { host, port }
  ],
  { host, port }
]
belochub commented 7 years ago

@DzyubSpirit No, we just need to convert tree to an array with size equal to 2 raised to a power of our tree height, and this algorithm will work just fine.

DzyubSpirit commented 7 years ago

@belochub And this array will have redundant elements? Maybe Is it better to send array of servers and the depth of tree?

belochub commented 7 years ago

@DzyubSpirit We can use this shorter tree interpretation, which you proposed, to send tree over network, but to make this algorithm work as intended, we need to create complete array on each server. Also it would not take a lot of memory or storage space, because some of the values in this array will be just references to real values identifying our servers, so there is no redundant elements.

belochub commented 7 years ago

Also we probably do not need to do Math.round on binary logarithm result, because servers.length will always be equal to a power of 2 and thus logarithm result will always be integer.

tshemsedinov commented 7 years ago

@belochub right, no round needed. I'll add gs.findServer(objectId) and gs.useServers(tree) to gs API.

aqrln commented 7 years ago

@belochub it will not, generally speaking. First of all, floating-point arithmetics is inexact by its nature, and, moreover, Math.log is also an approximation of the logarithm function based on Taylor series.