stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3.01k stars 668 forks source link

Explore ideas for DHT incentives / storage policies #59

Closed muneeb-ali closed 7 years ago

muneeb-ali commented 9 years ago

Currently, there are two types of incentives for running DHT nodes:

a) Someone running blockstored, is automatically a full DHT node as well (but I can imagine lightweight implementations where people try to turn that functionality off)

b) Companies (such as Onename) providing public DHT servers

It'll be great to brainstorm better incentives for running DHT nodes. Also, to combat SPAM there could be mechanisms where the "storage policy" at DHT nodes requires proof of "something", where that something could be a blockstore transaction.

muneeb-ali commented 9 years ago

@gavinandresen gave this interesting suggestion of using unspent outputs.

From Reddit:

"You should experiment with using unspent bitcoins to "buy" space in the DHT (e.g. you get to store 11 megabytes in the DHT as long as you keep an 11 BTC output unspent... where it really isn't eleven but some clever market-based mechanism where people bid unspent coins for the right to take up space in the DHT)."

(original comment on Reddit)

nrktkt commented 9 years ago

@muneeb-ali have you considered the issues around availability in a DHT? @gavinandresen seems right on the money with using proof-of-work to reserve space, a hashcash type anti-spam might be good, but unspent bitcoins does allow that space to be "freed" again, I really like that. But if the system is weak to eclipse or Sybil attacks on the DHT users will be unlikely to tie up capital in a system that can't guarantee they have access to the data they "paid" to store.

muneeb-ali commented 9 years ago

@kag0 Yep, absolutely. I'd divide it into two problems. Short-term and long-term.

Short-term:

a) In the short-term and to bootstrap the network, companies like Onename and others will have to put public servers out there (we already do and ask others to do the same). Since the hash of data is in the blockchain, the only type of attack possible on the blockstore DHT is availability which is precisely what you're talking about.

b) Since index of the entire key-value store is in the DHT, you can create a replica on something else like Bittorrent or even S3 (companies can have an incentive to do this for creating archives e.g., Internet archive does a great job of creating snapshots of Internet state). If blockstore DHT is missing data then it can be repopulated from other (slower/cheaper) archives.

c) Simple additions/modifications to the DHT code like periodically saving DHT state on disk, having a very large TTL on data expiry, and very-high replication factor etc can help (we're already looking into these).

Long-term:

This is where things get really interesting and you start getting into incentives for running DHT code. This is where we're open to suggestions and contributions. @shea256 has proposed a scheme where you pay people on "proof of storage" and can chime in about it. Sybil attacks are extremely hard to solve for in theory, but restricting DHT writes to some proof/activity on the blockchain is something people haven't explored much in the past.

nrktkt commented 9 years ago

@muneeb-ali the short term solutions sound good, and I think I may have something for you in some of the long term. Restricting storage like gavin suggested sounds awesome. Do you have a link to @shea256's idea? For Sybil I'm researching a new DHT design that might be perfect. It has huge shortest path variance and high redundancy which make it extremely hard for Sybils to eclipse a key. I hope to publish around May, but if you want to work together on something before then, I think that would be awesome.

muneeb-ali commented 9 years ago

Definitely! Shoot me an email re DHT work. I'll let @shea256 document it here (it was only a discussion in person)

sull commented 9 years ago

Might be interesting to think of how to integrate YaCy into the incentive mix for search and data replication. Also, generating "web" pages to represent the stored data and indexing these, searchable with YaCy, could prove to be a nice peripheral p2p service. Proof of yacy running could grant you more storage rights etc.

http://www.yacy.net/

shea256 commented 9 years ago

Hey happy to chime in here. Here's the idea, broken down by points:

nrktkt commented 9 years ago

So the owner would presumably pay the file holder on each audit? For the actual implementation I'm envisioning you'd have the block chain mapping keys to overlay addresses in a DHT, the overlay addresses in the DHT map to L3 addresses of peers who actually store the data. Does that sound in the right neighborhood? Because it presents a few issues but might work.

arbedout commented 9 years ago

@muneeb-ali if you haven't run into https://github.com/jrydberg/distfs/tree/master/entangled, take a look - it's a kademlia fork with a few additional RPC calls added to handle deletion of keys and other operations. It's fairly outdated (the docs mention Python 2.5) but could be useful for providing insight into what an 'enhanced' DHT implementation would look like.

ghost commented 9 years ago

My proposal changes would be investment returns based on activity on the node. The more activity, the better it is. Other solution is people should be paying fees to selected nodes of their choice. Nodes should act as a trusted node, and people need the ability to choose the trusted node.

A side note that DROs are more likely to run nodes far greater than normal users.

jcnelson commented 9 years ago

From the conversations we had at the Blockstack Summit, I think it's clear that there's no realistic way to prove that a node is serving the data (which is really what we care about). If a node operator can figure out whether or not a given request comes from the mechanism challenging it to prove that it is serving, then (s)he can configure the node to only serve data to respond to the challenge while offering degraded service (or no service) to everyone else. The individual node operators are incentivized to be dishonest, since it's more profitable to do the minimum amount of effort to get the reward (i.e. only serve data to the challenger). The only way around this is for DHT peers to work together to detect when a node is being dishonest, but this is extremely difficult in and of itself (i.e. what if the DHT peers doing the detection are simply on a bad connection? what if the peers are themselves being dishonest?).

I think the question we need to be asking is how we should incentivize app operators to mirror their users' app-specific data. My proposal is for app providers would run a DHT node, and a mirror of the keyspace in the DHT that corresponds to their users. That way, all data gets kept in the DHT, and the app operator gets to control the availability and durability of their data independently of the other operators.

jcnelson commented 7 years ago

We solved this issue in the Atlas system. A correct Atlas node stores 100% of the zonefile state. Node operators are incentivized to host the 100% replica since it accelerates their lookups and ensures that they have the requisite state in the event of a network partition or a DoS attack on an upstream node. The only case where a non-malicious node operator would want only a partial replica is when the node operator knows in advance which names will be queried, and can get away with not hosting the entire working set.

I'm going to go ahead and close this issue; if we want to re-visit storage incentives in the Atlas system, please re-open a new issue.

blockstack-devops commented 5 days ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.