Move giant files/datasets at Level1 + Level2

flyingzumwalt commented 7 years ago

Note: there's a bunch of published research on this. Haven't had a chance to read up yet. Links/Citations welcome if you get to this before me.

Suggestion from a friend (His exact words were "People have been moving big files for a long time. Don't reinvent the wheel here."): When moving giant files/datasets, skip the Network Layer (Layer 3) completely -- perform node-to-node data transfers relying only on the Data Link Layer (Layer 2) and the Physical Layer (Layer 1). [if you're unfamiliar with this notion of Layers, see https://en.wikipedia.org/wiki/OSI_model]

With this approach, you don't have to worry about routing (its a direct node-to-node transfer) and you don't need buffers because there's no routing. It also provides a much more efficient context for error correction.

Is this something IPFS could enable? That would be huge. Note: In these instructions we're already encouraging people who need fast throughput to temporarily turn off routing and form node-to-node connections. In essence that's the same thing, but without the efficiency of ditching the additional layers of protocols.

More links/references to come. At this point I just wanted to get the discussion rolling.

cc @lgierth @jbenet

ghost commented 7 years ago

This seems feasible and useful in local networks (L2 = Ethernet, usually), but I wonder how to apply this to transfers over the Internet. Anyhow, we should look into how we can skip IP and peer over Ethernet in local networks. Buffer bloat is a severe problem especially with crappy plastic routers. The downside of Ethernet is that you usually need root or NET_ADMIN privileges. I'm also curious to see whether uTP/SCTP/etc. work equally well on L2.

I should note that we've been long aware that we won't be breaking performance records in raw transfer speeds :) The performance improvements will come from content being more widespread, and a higher chance that a node physically very close already has what you want.

flyingzumwalt commented 7 years ago

I'm mainly thinking about situations where it's worth provisioning a temporary node-to-node L2 connection -- like if you need to replicate 50PB from the Netherlands to Australia in order to seed the p2p network on the other side of the planet. Wondering if IPFS or libp2p could help with that. More realistic -- I'm wondering if we can make sure the libp2p stack allows for that kind of use when the option is available.

wmturner commented 7 years ago

@flyingzumwalt Can you explain to me how you propose establish a node-to-node L2 connection over the internet? It sounds like an oxymoron to me (although technically you could tunnel an L2 network over L3, but I'm unsure what the point would be of that).

jbenet commented 7 years ago

@flyingzumwalt yes definitely. I think the easiest way to get IPFS to behave as well as possible here is to make an ipfs push <ref> <peer> command, similar to git's. "send a graph to the other node". It has some security / auth implications

Here's a sketch of what we'd like:

peer discovery + peer routing remain the same (still have to find each other) or maybe enhanced with additional high-end protocols
content routing is turned off or side-stepped here
there is a manual ipfs push <ref> <remote> to move content from one node to another
- this requires a capability, because writing-to a node should be a controlled thing
- some nodes could accept anything
- can be triggered by higher level porcelains, like apps (eg Orbit, ipfs-pack)

jbenet commented 7 years ago

Oh and also of course we need to optimize the hell out of bitswap and enable pushing to it (teach bitswap about caps). (or have a different receiver protocol, that checks the caps)
And optimize datastore (fs) i/o
In Q2 or Q3, We should evaluate making a CA kernel fs that works basically like fb haystack.

flyingzumwalt commented 7 years ago

@wmturner I'm primarily thinking of situations where the volume of content warrants the extra effort of establishing that L2 connection. I'm wondering what we can do to

Make IPFS capable of taking advantage of such connections and working efficiently as possible over those connections when they are available (along the lines of @jbenet's notes)
Set up tooling that makes it easy, or ideally frictionless, to do this with IPFS. Optimal scenario would be for IPFS (or associated tooling) to establish the connection for you

As you point out it might be untenable or unwise for the tooling around IPFS, an application-layer protocol, to be deeply aware of any details in the transport layer, since that violates the isolation between the layers of abstraction. Nonetheless, it is definitely worth exploring if it allows IPFS to gracefully handle moving giant volumes of data from point to point.

Kubuxu commented 7 years ago

Currently it isn't he bottle neck, and I am not sure if it will ever become. As @jbenet said we need to first optimize bitswap datastores and so on.

ipfs / notes

Move giant files/datasets at Level1 + Level2 #218