lni / dragonboat

A feature complete and high performance multi-group Raft library in Go.
Apache License 2.0
4.98k stars 533 forks source link

curious why shardid and replicaid is in uint64 and not something smaller like uint16 or uint32 or something? #310

Closed kolinfluence closed 7 months ago

kolinfluence commented 1 year ago

shardid can be uint32 and replicaid can be uint16 or uint8? uint64 is overkill right?

kevburnsjr commented 1 year ago

In autoscaling systems, it is common for the number of shards to fluctuate when autoscaling is enabled.
For instance, 10 shards with low traffic and 100 shards with high traffic.
If traffic fluctuates like this every day then that means 90 new shards per day.
Shard IDs can't be reused so the shardID is not just a function of the maximum number of shards that may exist at any point in time, but the maximum number of shards that will ever exist over the entire lifecycle of the cluster. Same for replicas.

I've run into problems micro-optimizing for fewer bytes in the past.
I think using uint64 for everything is generally a great design choice.

kolinfluence commented 1 year ago

@kevburnsjr use uint16 then it's 2 bytes vs 8 bytes. 2^32 is 4 billion shardid, it doesnt make sense 2^16 is 65535 seems to make more sense really. any shards that needs to grow beyond 65535, they shld customize the dragonboat themselves.

lni commented 1 year ago

@kolinfluence RAM is dirty cheap these days, we are seeing vendors releasing a single box with 100+ terabytes of RAM - and they are extremely fast & expensive RAM for GPUs!

If you look at the dragonboat codebase, it is not just IDs, actually everything is uint64 by default as long as it is reasonable. here reasonable usually means when the number of instances is limited (e.g. up to tens of thousands).

For a typical node with say a few hundred replicas, you will probably waste a couple kilo bytes of RAM when using 8bytes per ID rather than 4 or 2, but that gives you the peace in mind that you will never run into the problem of exhausting your ID space and thus you don't need to spend your time & money writing code to handle situations like ID collision.

kolinfluence commented 1 year ago

2^64 for shard id?! possible to make it 32 bit then? that's 4 billion shards already. who does 1 mil shard anyway?

kevburnsjr commented 7 months ago

To expand on this, ShardID and ReplicaID are control plane variables. They should not in any way affect the size of the data stored in the state machines. These types of optimizations are best performed in the data plane where data volumes are highest.

I think it's safe to consider this matter closed.

lni commented 7 months ago

Hi @kolinfluence

as @kevburnsjr correctly pointed out above, shardID and replicaID won't affect the overall size of your data storage. the reason why most variables are uint64 whenever possible is intentional - to save the headache of figuring out (when making regrettable mistakes) whether 8/16/32bits could be enough.

Using 64bit shardID does provide certain convenience as well, whenever you need a new shard with a brand new shardID, you can just randomly pick one, the chance of collision is literally zero unless you want a surprisingly large number of shards. The same can't be said for 32bit values.

1million shards are not a lot - assuming you have 64/128Mbytes per shard, just like some DBs do, 1 million of them give you a couple hundred TBytes data, that small data can actually be fitted into a single server.