storj-archived / kfs

Deprecated
https://storj.github.io/kfs
GNU General Public License v3.0
63 stars 23 forks source link

Author Whitepaper/Specification #1

Closed tacticalchihuahua closed 8 years ago

tacticalchihuahua commented 8 years ago

Here are some notes from the whiteboard:

img_20160826_125242

img_20160826_125248

1credit commented 8 years ago

Kneejerk reaction is positive, although I'd probably suggest future proofing it a bit and doubling one of those two constants. I've already been looking at 10TiB drives... Note I already have 20TiB available for Storj on (1) server - so would need to run multiple nodes (an OK thing). On a related note, unless adding multiple CPU nodes is planned soon, most people are going to want to run AT LEAST 1 node per core, so commonly 4 nodes per box. On the other hand, if each spawned levelDB was given its own process, a large chunk of the multiple CPU usage problem would be solved.

Fragmentation is a concern. This would set a maximum shard size to 51.2GB. So long as the loading programs were aware of that, no problem. It would probably annoy those trying to upload 1TB shards though. Still, seems like a reasonable restriction. Other related concern would be shards fragmenting the space: e.g. if someone uploaded a lot of 40GB shards, 11.2GB would have to be filled by other renters. Current defaults of 2MB shards with tails would, over time, resolve this.

Strong positive is the board noted forced distribution of shards. Math time: 51.2GiB max per renter nodeid * 2000 farmers (all needing to be online) = ~100TiB max upload per renter per bucket - at least until the network grows. Feels reasonable since that would represent 1000Tib's of data, or 1M seconds (116 days) of upload time if the renter had gigabit internet upload speeds, which most don't. Might limit some future commercial applications though. Oh, probably have to divide that by number of mirrors (max of 12 now I believe), so ~10 days of upload. That is becoming a borderline reasonable effort.

Related question: Can the upload system talk with 2000 farmers all at once???

super3 commented 8 years ago

@1credit So the limit is actually 8 TB per nodeid not 51.2 GB. Its 51.2 GB for every bit in the 160 bit node id. But yes this should make a max shard size of 51.2 GB because the DB won't do anything bigger.

The max upload per renter per bucket is limited to the total size of the network / redundancy. You can always store more than one shard on a farmer (although it is recommenced to spread it out).

1credit commented 8 years ago

Yeah, I saw that on the white board - thus my comment that 8TB is no longer the biggest drive you can get. Its a good number. Given "tails" of shards, anything will eventually get filled.

1credit commented 8 years ago

Any thoughts on foregoing leveldb completly and just using the hashid as a file name per Skunks suggestion? That would be a LOT simplier. Both would require a total restart of all downloads, or some one-time conversion process (likely faster) - but IMHO that is acceptable. We are still very much in a test group. Hmmm, maybe a test-group c2 - totally unpaid - just to make sure it works first? I'd be happy to volunteer some space and bandwidth.

tacticalchihuahua commented 8 years ago

That's exactly how it was implemented before we introduced LevelDB. We ran into many issues with performance and reliability there as well due to it's simplicity. Issues with max open file descriptors, unclean exits, the shard reaper was problematic, etc. If you go through the issue history you will see a number of issues related to FSStorageAdapter.

The choice to move to LevelDB came out of resistance to spending time building a Storj-specific storage layer (at the time the networking needed more attention). It became apparent that the storage layer was going to have to have a lot of attention and we just needed something the worked. LevelDB worked well enough, so I'm not keen on going back to the way it was if we can solve the current issues we are having with it.

1credit commented 8 years ago

Fair enough - thanks for the historical perspective.

tacticalchihuahua commented 8 years ago

Okay, I've translated the whiteboard into a little mini-paper. It's not a complex system, so it's pretty thin. See the README.md in this repository. I'll start working on the minimum amount of code required to test and validate some of the assumptions in the README.