shendo / peerz

P2P python library using ZeroMQ sockets and gevent
GNU General Public License v3.0
9 stars 3 forks source link

Distributed file storage #2

Open jvsteiner opened 9 years ago

jvsteiner commented 9 years ago

Hi, I'm interested in building a demonstrator for distributed persistent data storage - something like refuge.io - where nodes can publish content, replicate other nodes' data. I plan to use trusted timestamping or the blockchain to provide verification of data publishing time, and integrity. The goal is to be able for peers to build a trusted distributed database for arbitrary data that is blockchain backed, without bloating the blockchain itself, or having to work around the limitations in bitcoin's transaction format. Do you think, in principle, peerz might be a good choice for the P2P layer?

shendo commented 9 years ago

Hi there,

Possibly for the peer discovery.
The blockchain could contain references/hashes back into a traditional DHT for the actual block storage?

My limited understanding of how blockchains for crypto currencies work are that they get replicated out via a gossip protocol so that all peers end up with eventual consistency. So you would still need some mechanism to do this.

I should probably point out that the peerz project is currently incomplete. I have rewritten a significant portion to fix my misunderstanding of using zermoq (discovery protocol is now UDP based) before I got sidetracked on other projects. I'll endeavour to polish and push these changes up in the next few weeks.

If you are just looking for peer discovery in python you may want to check out https://github.com/zeromq/pyre though I'm not sure whether they implemented the gossip portion yet (i.e. just be local broadcast discovery).

jvsteiner commented 9 years ago

Hi Steve, Thanks for the reply, I really appreciate it! I'm still in the exploratory stages for this project.

I looked at pyre - looks like they are doing local broadcast discovery, which won't, I think, get me where I need to go. Essentially, the stack I'm envisioning would look like this:

a) peer discovery, UDP, with something to solve NAT traversal. b) DHT to store data + signature/hash, configurable by the user to so that they can be interested in all or only part of the table: uses gossip protocol to replicate/distribute updates to the DHT to those interested nodes c) backend data integrity, so the signature/hash can be verified against a blockchain: bitcoin could be used for this part, as you suggest. Even PKI signatures could be useful - I'd probably build it to be fairly agnostic to this point.

thinking peerz could be useful for a). c) is pretty straightforward to me. constructing b) is currently the "long pole in the tent."

-Jamie

Steve Henderson mailto:notifications@github.com August 13, 2015 at 1:47 AM

Hi there,

Possibly for the peer discovery.

The blockchain /could/ contain references/hashes back into a traditional DHT for the actual block storage?

My limited understanding of how blockchains for crypto currencies work are that they get replicated out via a gossip protocol so that all peers end up with eventual consistency. So you would still need some mechanism to do this.

I should probably point out that the peerz project is currently incomplete. I have rewritten a significant portion to fix my misunderstanding of using zermoq (discovery protocol is now UDP based) before I got sidetracked on other projects. I'll endeavour to polish and push these changes up in the next few weeks.

If you are just looking for peer discovery in python you may want to check out https://github.com/zeromq/pyre though I'm not sure whether they implemented the gossip portion yet (i.e. just be local broadcast discovery).

— Reply to this email directly or view it on GitHub https://github.com/shendo/peerz/issues/2#issuecomment-130470359.

shendo commented 9 years ago

All good.

Yep, (if the implementation catches up to the vision) peerz would fit into a) and b) above.

One of the differences in peerz to kadmelia is that the node identifiers used aren't random but are public keys. This will allow signing and encryption of messages to prevent spoofing of peers. Exactly how this will affect the DHT yet I'm not sure but could contribute to c).

I like your idea of having the DHT be the block storage and some kind of shared blockchain being the metadata/journal of the filesystem. Who knows, if you integrated into a crypto currency's blockchain you may even have a way to do paid node resource usage (storage, bandwidth?) in p2p.

Guess I'll have to put some more effort into getting this project back up and running. :)