juliangruber / backer

wip distributed backup / file mirroring tool
MIT License
60 stars 2 forks source link

replication channel #3

Open juliangruber opened 11 years ago

juliangruber commented 11 years ago

bittorrent sync uses a central dht/server and one unique id per user to connect machines.

It would be cool if we could do without any central piece, which on the other hand means:

We could offer a broker service that you can use when you don't have a server maybe.

juliangruber commented 11 years ago

Or people could host their own brokers which they can offer to their friends. And you could use multiple brokers per user, for redundancy!

max-mapper commented 11 years ago

telehash was trying to do this, dunno what the status is

dominictarr commented 11 years ago

This is really a whole other module/ecosystem. actually, there is a million things we could use connections between arbitary node instances for...

I see it as having 3 phases:

You could detect the machines in the network by either nmapping (attempting to open connections to everything in the 192.168.0.* range or using udp multicast. This would work for localnetworks and datacenters, and make it configuration free.

For servers that have public ip addresses, you could have a start list, of one or more servers that you expect to be turned on, and then use a gossip protocol to track which machines are connected to the net work.

I'm not 100% sure how nat traversal works, but it would be awesome to have a node client that is compatible with webrtc, then you could have node<-> browser p2p connections! Maybe @feross can help answer this question.

juliangruber commented 11 years ago

@dominictarr I see this repo more like jsgit, it's the end user thing but can consist of a core and many other modules. Will document that.

refset commented 11 years ago

@maxogden I think Telehash is still evolving nicely... http://github.com/quartzjer/thjs

Perhaps @quartzjer can enlighten us.

buschtoens commented 11 years ago

We can divide this into two steps:

1. Discovering other peers, that belong to your own network

To be realtistic, only very little of the end users will have a client, that is constantly available under the same IP address or domain, which would be required for a fully self-sustaining p2p network. @juliangruber's broker idea is probably the most simple to implement. But after heaving a quick read through TeleHash (@maxogden, thanks for the pointer!) I'm really +1 on it. It seems pretty solid and well thought-out, plus there's already a package for that. (It doesn't really differ from the broker anyway.) To have a private broker/tracker we can implement a simple form of authentification and everything runs smoothly.

2. Keeping a connection to those peers

We could be using TeleHash for this aswell, but UDP is not reliable enough. So once the client has disovered a peer, that belongs to his network, it should open a permanent TCP connection to it (and every other peer as well). I don't think, that you're gonna have more than 20 clients per network, so this shouldn't be to bad. We then can use this connection to send events back and forth, like notifications about updated/added/removed files. The client then would open another connection to a peer to download the changes. This way the control connection always stays responsive and multiple file down- and uploads can run in parallel. Just like BitTorrent. This would also allow for a nifty load balancer, so we always get the change sets as fast as possible.

buschtoens commented 11 years ago

Relays? Those clients would act as servers, that cache the data packets and forward them, but can't decrypt them. This can speed up file transfers.

quartzjer commented 11 years ago

Chiming in a bit, I believe that telehash is actually a great fit, but I'm not sure on the timing, it's going to be a moving target for about a month here as we get about a half-dozen telehash implementations interoperating and work out the last kinks in the protocol...

Also, once two hashnames are connected via telehash, the protocol fully supports reliable (encrypted) raw data channels between them. I need some more examples of this pattern (one WIP is https://github.com/quartzjer/worm) and it's probably going to suffer some breaking on the horizon too, but that's definitely a design goal, to support full mesh p2p (there are well known / stable seeds so transient nodes are welcome) with strong data pipes :)

dominictarr commented 11 years ago

how does telehash do binary data?

quartzjer commented 11 years ago

The raw packet format has two parts, json and binary, so (reliable) channels can be created between any two hashnames that contain either or both... the binary is just raw bi-directional streams, nothing fancy :)

dominictarr commented 11 years ago

perfect!

feross commented 11 years ago

Chiming in a bit late here.

@dominictarr: "I'm not 100% sure how nat traversal works, but it would be awesome to have a node client that is compatible with webrtc, then you could have node<-> browser p2p connections!"

NAT traversal is straightforward if you use WebRTC because the implementations are required to handle it for you, based on the spec. You just specify a STUN server, and that's it. If you want, you can also specify a TURN server for fallback if two peers cannot establish a direct p2p connection because they're both behind symmetric NATs (rare).

I don't know of an npm module that gives you a WebRTC client, though I don't imagine it would be too hard to make one. The WebRTC C++ code used in Chrome is open source. I think you'd just need to bind it to a JS interface, but I have no experience with native modules in node.

dominictarr commented 11 years ago

aha! that is a great idea!

guybrush commented 11 years ago

relevant: http://www.youtube.com/watch?v=Al3SEbeK61s&feature=share&t=7m30s

DamonOehlman commented 11 years ago

Hey All,

There's been a few attempts at writing node bindings to underlying webrtc stack in the chromium source (previously called libjingle). From what I've seen, node-peerconnection is a solid start for creating this and I've recently been updating it to work with the latest webrtc source code (see: https://github.com/DamonOehlman/node-peerconnection/tree/updated-basecode).

There are a couple of other modules out there too, I'll trying digging them up over the next couple of days and posting them here. Even if you decide to start from scratch then they'll be a good starting place (although I'd recommend using @rvagg's nan helpers as well.

Also as mentioned on twitter, I had a chat with @silviapfeiffer at work (NICTA) about this and we can see the value in getting some node --> c++ bindings written. At this stage though our approach will likely be to create the specific functionality we need at the c++ layer, wrap that into a c library and then create node bindings for that library. The primary reason for this being that the surface area of the underlying WebRTC c++ library is massive and also subject to quite extensive change as things get updated in the spec and thus chrome, etc.

Cheers, Damon.

dominictarr commented 11 years ago

we'd mainly need the reliable datachannel, maybe could just bind to that? @DamonOehlman would the C++ library be compatible with WebRTC in the browser? If it was, I can see significant interest from people working on webrtc stuff!

DamonOehlman commented 11 years ago

Targetting the data channel was what I was thinking too. Yeah, compatibility would be there (if not straight away, then eventually) - see https://code.google.com/p/webrtc/issues/detail?id=2279.

Just reading the issue thread in it's entirety now though, I'm not sure that WebRTC will remove the need for a central broker completely. While the WebRTC stack has everything it needs to do NAT traversal and successfully negotiate through firewalls (given that supporting network infrastructure is there - see ICE), you would still need to do the initial signalling between the peers through a broker of some kind.

Workload on the broker would be light though so not too costly. Might see if I can spend some time on this either later this week or early next :)

dominictarr commented 11 years ago

it's also worth noting that the broker will be a completely standard thing, so it would be pretty easy to have many brokers, and allow anyone to run their own broker

feross commented 10 years ago

@DamonOehlman Could you post those alternate implementations of WebRTC Data Channel (or PeerConnection) on the server? I'm trying to see if it makes sense to use/improve one of them or just write my own.

DamonOehlman commented 10 years ago

@feross This is the one that I think is looking most promising at the moment (cc @modeswitch):

https://github.com/modeswitch/node-webrtc

I'd probably recommend against starting writing your own (if you can resist) as there are so many node WebRTC binding libraries that have been started and then abandoned. It's a pretty big task because the surface area of the underlying webrtc libraries is pretty massive.

An alternative (and active) approach to have a look at is erizo in licode:

https://github.com/ging/licode/tree/master/erizo

These guys are putting together an implementation using the technologies that power the WebRTC stack (libstrp, etc). I don't think they've looked at data channels yet, and that may not even be on there radar given their strong focus on video/audio.

FWIW, my money is on approach 1 (bindings to webrtc c++ code which was previously known as libjingle). So again, have a look at what @modeswitch is doing and see if you can lend a hand there :)

feross commented 10 years ago

@DamonOehlman excellent - thanks for the quick response. i like approach 1 and will try to lend a hand :)

modeswitch commented 10 years ago

To add a to what @DamonOehlman said above: I'm actively working on node-webrtc. Most of the API for data channels is there, audio/video support will come sometime later. Most of the issues I'm working on now are in libjingle rather than the node bindings. Support for building on Windows and OSX could benefit from additional contribution, and the module in npm needs testing and a bit of work I think.