radumarias / rfs

Distributed filesystem written in Rust
Apache License 2.0
22 stars 2 forks source link

Distributed-Hash-Table DHT #4

Open radumarias opened 1 month ago

radumarias commented 1 month ago

https://github.com/radumarias/rfs/wiki/Distributed-Hash-Table-DHT

https://github.com/radumarias/rfs/issues/39 https://github.com/radumarias/rfs/issues/3

radumarias commented 1 month ago

I propose that files are immutable on the system, that's once a file is replicated and in the system it can't be modified.

that could be problematic for imagine a database which does small changes in short time

I think it would not be so hard to locate the replicat and update them on change Also we will apply changes with WAL so multiple writes can be in parallel an will apply them in order, similar is how DBs are handling transactions. I'm doing this in rencfs projects. Like this we could also allow parallel writes to multiple replicas, as all will eventually get to the same consistency

radumarias commented 1 month ago

If the user define how many nodes and the topology of the system that's we have x amount of nodes in the cluster do we really need to implement a peer exchange protocol.

As I understand PEX sends list of peers for the file and also what each peer what shards they have and what chunks from the sharks (like seeder)

The network topology I assume could be also synched with Raft, just we can use PEX for the above shards and chunks metadata

But do I get it wrong on what PEX does?

radumarias commented 1 month ago

I would still try with PEX also as seems more reliable and more scalable. Storing initial metadata in DHT seems better that torrent file as it's more scalable and fault tolerant

The file is divided into chunks, ideally not more than 256 KB per chunk.

The actual shards (splits of the original file) I would imagine 512MB or 64MB and then from torrent POV each of these shards is seen as a file where yes chunk should be in the order of hundreds of kB

radumarias commented 1 month ago

I agree we don't need PEX but DHT. I think it's more scalable to use DHT instead of a tracker as it eliminates the single point of failure.

Will we need to create some service that acts like DHT and reads metadata from tikv?

radumarias commented 1 month ago

This leaves us with a tough problem the replication and how are we going to replicate a piece over the nodes ?!

Well we can implement it similar to ConsistentHashing, actually the initial implmentation I did already supports file replicas distribution https://github.com/radumarias/rfs/blob/feat/Building-the-sharding-algorithm-to-know-where-each-chunk-go_1/shard-distribution/src/consistent_hashing.rs#L65 we could use smth like that to distribute replicas and redistribute the ones from dead nodes