ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.83k stars 2.96k forks source link

Implement bandwidth limiting #3065

Open whyrusleeping opened 7 years ago

whyrusleeping commented 7 years ago

We need to place limits on the bandwidth ipfs uses. We can do this a few different ways (or a combination thereof):

Related Issues:

slothbag commented 7 years ago

Here's two more related issues :) https://github.com/ipfs/go-ipfs/issues/920 https://github.com/ipfs/go-ipfs/issues/1482

k0d3g3ar commented 6 years ago

This is critical if you want mass adoption. No one is going to risk their own local Internet connection bandwidth unless they can control it. That means using a 3rd party bandwidth limiter in front of IPFS which is just more complexity that isn't necessary.

fiatjaf commented 6 years ago

Perhaps using the alternative C by default with low limits, but when putting IPFS on an "active" state switch to A (or to no limit at all). The "active" state should be when the user is actively downloading, adding or pinning something and some time after that, or when he is using IPFS from some management GUI or JS app.

EibrielInv commented 6 years ago

I was thinking to implement (but never did), a script that alternates, every ~120 seconds between "offline" and "online" mode. It can also read the amount of connections, and restart the client when passes some threshold. Something like:

voidzero commented 6 years ago

global limiting using a single rate limiter over all connections cons: ipfs will be quite slow when rate limited in this way

Global limiting has my vote. And I'm not sure if this con is true in all cases: bandwidth of course already has a hard limit (the limit of the connection). So if I already have a max of 20mbit down / 2mbit upload, and I limit ipfs to half of this, that is still a decent amount of bandwidth, isn't it?

guybrush commented 6 years ago

I think it would be best to do global limitation and then also limit per protocol relative to the global limit. For example let globalLimitUp = 1mbit/sec, globalLimitDown = 2mbit/sec and then every protocol gets its share of the available bandwidth depending on how important it is for ipfs to function properly.

Maybe i misunderstand the problem though, i just came here because i noticed the high use of bandwidth.

700 peers and 3.5 Mbps, both numbers climbing with no end? I am on win10 and ipfs@0.4.13 running the daemon with ipfs daemon --routing=dhtclient.

Stebalien commented 6 years ago

@guybrush FYI, you can limit the bandwidth usage by turning off the DHT server on your node by passing the --routing=dhtclient flag to your daemon.

hitchhiker commented 6 years ago

This is essential, checking back on this. Without limiting, it's hard for us to package this in projects -> we can't expect end users to accept such a heavy bandwidth requirement.

whyrusleeping commented 6 years ago

Please just add an emoji to the issue itself to add your support. Comments in this thread should be reserved for discussion around the implementation of the feature itself.

jefft0 commented 5 years ago

I've been running an IPFS daemon for years without problems. But with the latest builds in the past couple weeks, I have a lot of delays in trying to load web pages or even ssh into another server. It's now at the point where I have to shut down the IPFS daemon to do some tasks. My stats are below. The bandwidth doesn't look so bad, so why does my network suddenly seem clogged?

$ for p in /ipfs/bitswap/1.1.0 /ipfs/dht /ipfs/bitswap /ipfs/bitswap/1.0.0 /ipfs/kad/1.0.0 ; do echo ipfs stats bw --proto $p && ipfs stats bw --proto $p && echo "---" ; done ipfs stats bw --proto /ipfs/bitswap/1.1.0 Bandwidth TotalIn: 1.1 MB TotalOut: 6.1 kB RateIn: 1.9 kB/s RateOut: 0 B/s

ipfs stats bw --proto /ipfs/dht Bandwidth TotalIn: 41 kB TotalOut: 3.2 kB RateIn: 483 B/s RateOut: 1 B/s

ipfs stats bw --proto /ipfs/bitswap Bandwidth TotalIn: 0 B TotalOut: 0 B RateIn: 0 B/s RateOut: 0 B/s

ipfs stats bw --proto /ipfs/bitswap/1.0.0 Bandwidth TotalIn: 0 B TotalOut: 0 B RateIn: 0 B/s RateOut: 0 B/s

ipfs stats bw --proto /ipfs/kad/1.0.0 Bandwidth TotalIn: 21 MB TotalOut: 1.6 MB RateIn: 164 kB/s RateOut: 8.9 kB/s

whyrusleeping commented 5 years ago

@jefft0 thats odd... those stats seem relatively normal. Are you seeing any odd cpu activity? what sort of bandwidth utilization does your OS report from ipfs? Also, how many connections does your node normally have?

Another question is, since you mentioned noticing this on recent builds, does running an older version of ipfs fix the problem?

whyrusleeping commented 5 years ago

Also, cc @mgoelzer and @bigs, despite this being on the go-ipfs repo, this is definitely a libp2p issue. Worth getting on the roadmap for sure.

jefft0 commented 5 years ago

I solved the problem by restarting my Internet router, restarting the computer, wiping the IPFS build directory and rebuilding the current version (but keeping my current ~/.ipfs folder). I know this wasn't very methodical, but I was desperate. Next time I have bandwidths problems I'll try to figure out which one of these causes the problem.

whyrusleeping commented 5 years ago

@jefft0 interesting. Thats actually more helpful information than you could have provided, thanks

whyrusleeping commented 5 years ago

Also, just so everyone watching this thread is aware, we have implemented a connection manager that limits the total number of connected peers. This can be configured in your ipfs config under Swarm.ConnMgr, see the config docs for more details.

bigs commented 5 years ago

Definitely a fan of the per-protocol limiting. Perhaps this could be handled with a weighting system? Assign weights to protocols and then set global settings (i.e. throttle after this amt of transfer per duration, halt all transfer after this limit within duration.)

leshokunin commented 5 years ago

Very cool to see progress! How's the bandwidth cap (eg: 50kb/s) coming along? It'd be super useful for our desktop client :)

douglasmsi commented 5 years ago

Are there news about this topic?

Stebalien commented 5 years ago

Not at the moment. The current recommended approach is to limit bandwidth in the OS.

lidel commented 5 years ago

PSA: if anyone is looking for a good third-party userspace bandwidth shaper, check trickle:

trickle -s -u 50 -d 50 ipfs daemon --routing=dhtclient

I know people have been using it with go-ipfs in the past on Linux for "global limiting using a single rate limiter over all connections"

Trickle has been reported to work on a wide variety of Unix-like operating systems including OpenBSD, NetBSD, FreeBSD, Linux and Sun Solaris, and is by its very nature also architecture agnostic. – Trickle: A Userland Bandwidth Shaper for Unix-like Systems [PDF]

douglasmsi commented 5 years ago

Trickle looks like a good option for Linux. But for Windows, do you have a good option that could be managed by command line too?

marcusnewton commented 5 years ago

@douglasmsi Netlimiter seems to do the trick. Haven't found a command line interface yet

CocoonCrash commented 5 years ago

Instead of limiting used bandwidth is there an deep analysis of consumed bandwidth to know if it is an implementation problem, a design problem, a bug etc?

bachrc commented 5 years ago

Bumping this, this is pretty contraining when running a node.

Mikaela commented 5 years ago

This bandwidth usage issue is why I am not running ipfs node anywhere currently, while I used to run it on one laptop and two desktops of which other was almost 24/7.

I think IPFS would be useful for me, but I cannot run it due to how it currently blocks web browsing and affects even mosh badly.

Mikaela commented 5 years ago

As an update to my previous comment, I have managed to get ipfs running without affecting web browsing or other activities by tweaking the ConnMgr options.

    "ConnMgr": {
      "GracePeriod": "1s",
      "HighWater": 25,
      "LowWater": 5,
      "Type": "basic"
    },

If I understand this correctly, excess connections are given one second before they get cleaned, maximum preferred peer amount is 25 and connections are attempted to get at least to 5 peers. However at the time of writing I have 4 peers and IPFS Companion reports 60 peers if I visit https://ipfs.io/ or otherwise use the daemon.

I am running ipfs daemon with flags --routing=dhtclient --enable-gc and have also removed public IPv4 addresses from swarm as I am always behind CGN or another NAT I don't control and I find IPv4 connections less reliable. I have enabled Quic for curiosity. Interestingly the default ConnMgr options were killing my IPv6 connectivity (https://github.com/ipfs/go-ipfs/issues/3320?), but my router stops sending IPv6 RAs also by itself when left alone for more than a day or it goes powersaving mode, which requires me to reboot it at least once per day.

    "Swarm": [
      "/ip4/127.0.0.1/tcp/4001",
      "/ip4/127.0.0.1/udp/4001/quic",
      "/ip6/::/tcp/4001",
      "/ip6/::/udp/4001/quic"
    ]
carleeto commented 5 years ago

I'm really interested in running a node over a cellular connection. Bandwidth is one aspect and I may be oversimplifying here, but if ipfs is based on bittorrent, shouldn't it be possible to specify a maximum amount of traffic too? Won't this accommodate that majority of use cases as far as internet plans are concerned?

skliarie commented 5 years ago

IPFS works differently than bittorrent. In absence of central tracker, it relies on as many as possible connected nodes (DHT) to find and retrieve content. This mandates your node to maintain many connections and yes, use expensive cellular traffic. Traffic limiting would only cause your ipfs node to grind to a halt. IMHO, there should be a way to establish leecher/proxy ipfs nodes, to support cellular or other bandwidth limited ipfs users.

Stebalien commented 5 years ago

IPFS works differently than bittorrent. In absence of central tracker, it relies on as many as possible connected nodes (DHT) to find and retrieve content. This mandates your node to maintain many connections and yes, use expensive cellular traffic.

This is a bit of a simplification. We do need to keep some connections open however, we should be able to:

  1. Significantly reduce background traffic.
  2. Suspend connections: https://github.com/libp2p/go-libp2p/issues/438.

Etc...

whyrusleeping commented 5 years ago

old but relevant discussion: https://github.com/ipfs/go-ipfs/issues/4029

DanielMazurkiewicz commented 4 years ago

IMHO, there should be a way to establish leecher/proxy ipfs nodes, to support cellular or other bandwidth limited ipfs users.

My only internet access is via cellular net and I would share my precious limit of 10GB, but would like to have control over it, like for example I would share 5GB, or everything I've received up to two-three times. Lack of any limits is only thing stopping me from IPFS

Bluebie commented 4 years ago

BitTorrent solved this problem originally with user configurable bandwidth limits, and it was non-optimal because users would limit the speeds fairly aggressively to leave enough room for the worst case. Later they implemented Delay Based Congestion Control with protocols like the open µTP, which provides a tcp-like reliable ordered stream connection, but runs over UDP (which can have some hole punching advantages for dealing with NATs too).

For anyone not up to speed, here's a general overview of how it compares to TCP:

TCP Congestion Control

Micro Transport Protocol - Delay Based Congestion Control

The end result, is that while it seems like µTP means your app is lower priority than everything else on the network (except bittorrent and a few obscure system services that use delay based congestion control like macOS software update downloader service) the end result is that users don't quit your daemon out of frustration as much, housemates don't yell at you for screwing up the wifi, and your app actually gets to use almost 100% of the available bandwidth at any moment, instead of being speed limited to 50-80% in the best case by frustrated users, and limited to barely a trickle in the worst case, turning peers in to leeches unnecessarily. It removes the need to have configurable global bandwidth limits for managing network congestion issues.

It's also worth noting, that p2p apps like IPFS have an unfair advantage in a loss-based congestion control scheme. For every TCP-like connection we're swapping files through, we effectively get an extra vote, and are unfairly advantaged compared to apps that only use one or a very small number of TCP connections to communicate. Loss based congestion control is only fair if all the apps running on the network use a roughly similar number of parallel TCP connections to do a transfer. This is part of why p2p apps are notorious for ruining home internet connections, and why, even if your router doesn't have frustratingly large transmit buffers, p2p apps that still use loss-based congestion control can still outcompete other apps on the network for resources.

It might be worth investigating implementing µTP on IPFS. Notably there are implementations in Go and Javascript already, thanks to people's work on porting bittorrent to every turing in the universe. It should be fairly straight forward to implement in go-ipfs and js-ipfs on NodeJS. The protocol is very decoupled from BitTorrent, quite light weight, and nearly as efficient as TCP in terms of network overhead, and implementations typically exposes an API very similar to system TCP APIs. Often existing network libraries can be made to use µTP without modification because many already accomodate providing them a custom network connection object to support things like tunnelling through TLS libraries.

The main hard part, would be figuring out how to implement equivalent flow control over our other protocols. Especially in the browser context, where we have to make do with things like WebRTC. But I really think it's worth prototyping and that delay based congestion control would totally solve this problem for nodes running on consumer household internet connections, without adding any additional configuration complexity. It might add a minor risk of deep packet inspection systems misidentifying our traffic as BitTorrent traffic and trying to shape or block it though.

The final downside, which i don't think is very applicable to IPFS currently, is that delay based protocols, while they do calibrate themselves to any system time clock differences between each computer, they can be fairly sensitive to clock drift. This is no problem on desktop computers and smartphones, but it might impact someone trying to implement IPFS on an IoT device like a lightbulb, that might not necessarily have a crystal clock source.

If we do implement a delay-based congestion control protocol, it's also worth thinking if we should have APIs to communicate urgency to the network layer. For instance, a user requesting a DAG node is probably using it interactively, and should maybe have a more aggressive connection that tollerates more buffer latency, while a daemon that's just passively pinning some content or doing DHT maintenance tasks should probably accept less buffer latency. This way, low priority tasks will naturally back off to make way for higher priority tasks, even between different computers.

So I fully support that we need bandwidth limiting, I really encourage people not to implement global configurable static limits, and to look at delay based congestion control instead. It should help keep IPFS reliable and make sure as many participants in the network as possible are sharing data with all of their spare internet capacity without impacting the usability of their networks for existing apps. Lets not repeat the mistakes BitTorrent has made, and try to fix this problem before IPFS becomes popular enough to develop a similar bad reputation among novice users.

dcflachs commented 4 years ago

@Bluebie Based on this it looks like µTP transports are already in the works down at the libp2p layer.

dbaarda commented 4 years ago

Note the widely used packet-loss based congestion control for TCP is known as CUBIC. There is now a latency based TCP congestion control algorithm BBR (released/used by Google) that is widely available in Linux, and starting to be enabled by default on some distros.

BBR is MUCH better than CUBIC. I believe it's also used inside QUIC which uses UDP. The really nice thing about it is unlike CUBIC which ends up slowly tending towards keeping all buffers along the path full, it quickly converges on keeping the buffers almost empty, which not only avoids packet-loss hickups almost entirely, but minimizes latency.

The main argument for static limits is they are easy to understand for end users, and some users really do want to bandwidth limit a background service because they have bandwidth quota's they don't want to exhaust. So congestion is only one part of the reason, quotas is the other.

Perhaps a mixture of congestion/latency based control settings for transient spikes and overall "upto 10GB/month" quota settings would be best.

Bluebie commented 4 years ago

It also looks like browser vendors are aiming to implement a proposal called WebRTC RMCAT which adds delay based congestion control among other things to WebRTC traffic, so it looks like we should be able to have good congestion control without global rate limits across all platforms in the future.

I really like the “upto 10GB/month” quota proposal. I think that would be really useful. For example, I’d love to run an IPFS node on my linode server, and enable relay to help support users who are badly stuck behind a nat, but it’s important for me that it doesn’t come at an extra cost so being able to donate, say, 800gb of that sort of support to the ecosystem over a month would be great to take out the risk from doing something like that. Maybe an implementation of that idea could be smart about prioritising generous services, like limiting participation in Relay work first, then when it gets even closer to the limit, starting to get more aggressive in cutting off other services, so stuff like DHT work can be prioritised and relaying doesn’t end up eating the quota quickly and effectively disconnecting the node from the DHT?

lordcirth commented 4 years ago

With a simple hard limit, I would be concerned that we'd get spikes early in the month, and then dead later. There would definitely need to be more smarts involved than that. Perhaps even a daily limiter could work better?

Bluebie commented 4 years ago

On the UI side I feel it would be good to be able to define it as one day, one week, or one month because that’s how people tend to think about data quotas.

On the implementation side, the first thing i’d want to try is this: 1) on startup choose a random number between 0 and 59 inclusive 2) setup a number, which counts how many bytes have been used for relay work 3) set a timer, so every hour, at the random number minute, take that data counter number, store it to an array at the end, and reset it to 0. If the array contains more than 247 entries, delete the oldest entry 4) setup another timer, executing every, lets say, five minutes, which sum’s all the numbers in the array together, and checks if the usage over the past week of hours is under 1 Week Quota 0.99, and if it is, enable relay. If relay is already enabled, check if current usage over the past week of hours is over 1 Week Quota, and if it is, disable relay.

Then the nodes should gently alternate relay on and off as needed to keep it close to the quota. Choosing the random number for the minute offset at startup would help with ensuring the whole network doesn’t have more relays at the start of an hour and less near the end. The nice thing about it is it would average the availability over time, in little five minute chunks whenever they’re in budget, and hopefully not add too much work to the node in executing timer functions.

Could reduce how often timers run by only executing the quota check in response to actual relay requests, and maybe just turning the relay back on during that once an hour quota log turnover in step 3. So once a relay request comes in that exceeds the quota, it shuts off, and re-enables once more quota has become available in a random amount of time that’s no more than an hour from that moment.

DanielMazurkiewicz commented 4 years ago

On the UI side I feel it would be good to be able to define it as one day, one week, or one month because that’s how people tend to think about data quotas.

Periodical quota limit autorenewal should be optional in my opinion. Would be more happy if could define profiles with connection speeds and data quota that would show up to choose after I crossed limits from current profile.

Predefined profiles could be shipped as a solution for users that are not familiar with technicalities


As for me it would be nice to have profiles consisting:

Limiting options per kinds of traffic:

  1. Ecosystem
  2. Data

Limiting options

  1. Connection speed (eg kbps)
  2. Data quota (eg MiB)
  3. Factor of received data (eg 2 x)
dbaarda commented 4 years ago

On the implementation side, the first thing i’d want to try is this:

Don't do it that way. You want to use a control-systems approach, which will actually be simpler and work better. Use a low-pass filter of transmit rate, or effectively the same, an exponentially decaying traffic count, like this;

  c = (dc + c) * T / (T + dt)

Where;

  dt is the time since c was last updated.
  dc is the amount of traffic sent since the last time c was updated.
  T is the time over which you are averaging.
  c is effectively the amount of traffic sent in the past T time.

This is cheap enough (in compute and storage) you can calculated it at every transmit/lookup, or you could do it periodically. It is important to keep dt small compared to T for it to be accurate. It's not an 100% accurate measure of the traffic in the last T time, but its close enough and behaves better than moving windows when using it as an input to control something.

Toggling enable relay on/off is a pretty rough control mechanism, so you'll need a gap/on-off/bang-bang controller. You would need to tune the "gap" based on the rate of change of c (which depends on T and your available bandwidth), and what kind of rates IPFS can handle enable relay toggling at.

However, proportional control is usually better. If you already have a proportional controller of transmit bandwidth for congestion control, a much better idea would be to integrate the quota signal into it, so that transmit bandwidth is controlled by a single proportional control system that takes into account both congestion and quota signals.

calroc commented 4 years ago

I've been told that this sort of thing should be done at the OS level. Maybe a blog post or two on how to do that (or how to figure out how to do that) would be useful?

E.g. if you were to spin up a VM on digital ocean to pin some content, how to ensure you're not going to get a surprise on the bill from excess bandwidth?

Stebalien commented 4 years ago

How really depends on your situation and google will likely give you a better solution for specific situations than a general-purpose "here's how to restrict bandwidth" blog post.

davidak commented 4 years ago

Bittorrent clients usually have a feature to limit bandwith, so users might expect it from this software too.

When general solutions are available, please link to them in the documentation. A user might not be a professional in this field and just want to spin up a "IPFS node" on some hoster, like @calroc said.

calroc commented 4 years ago

@Stebalien consider this scenario:

A user searches for information on "IPFS bandwidth limiting" and finds this ticket. It's closed so they hit end to see what the resolution was, and there's a link in the closing comment to a brief article describing the specific situations you might find yourself in, with links to solutions (on third-party sites or wherever such information is to be found.)

Or, they find this ticket, open, three-and-a-half years old, and you telling people to RTFM.

See, it's not a matter of addressing the ticket: the question is do you want this ticket to represent IPFS project policy on this issue?

FWIW, The answer I'm looking for is already in here. (To wit: use trickle. I did. IPFS+Trickle sat on a DO droplet and worked fine for so long I forgot it was running. One day last week I clicked on an old IPFS cloudflare gateway URL and was surprised my content was still accessible, that's how I was reminded! lol)

So just "bless" trickle with an official mention in the docs? And, while you're at it, mention that bandwidth usage might be something the hobbyist user might want to be aware of? It would suck for an unexpected bill to be a part of someone's maiden voyage with IPFS, eh?

Last but not least, how do the IPFS devs and community deal with this in general? Do y'all just know what to do without thinking about it? Or run in datacenters with phat pipes? If you have ways and techniques I'd love to hear about them, on the other hand if this isn't really a problem for you I'd love to know what I'm doing wrong?

Stebalien commented 4 years ago

You're right, we should clearly document how to limit bandwidth. My concern with documenting anything like this is it can easily lead to more confusion than it solves as every case is different. My knee-jerk reaction is to avoid additional bug reports/confusion at all cost.

However, you're still right. We can and should document the simple "works in most cases" approach.

Last but not least, how do the IPFS devs and community deal with this in general? Do y'all just know what to do without thinking about it? Or run in datacenters with phat pipes?

On my laptop, I run my node as a DHT client (ipfs daemon --routing=dhtclient). Go-ipfs 0.5.0 also reduces background bandwidth usage significantly, but less so if you're running a DHT server.

lordcirth commented 4 years ago

Last time I tried using Trickle with IPFS, it only limited the main thread, and all the other threads that used all the network traffic were unlimited. Is there a flag to get around that?

calroc commented 4 years ago

@Stebalien cheers! I really want to use and promote IPFS and I sincerely believe this would help.

@lordcirth it was over a year ago that I last tried it. Something may have changed in the meantime, but back then IIRC back then Trickle did limit IPFS overall, not just the main thread.

constantins2001 commented 4 years ago

I also would like to use IPFS in a P2P CDN, but as I'm unable to provide users with bandwidth limitation settings and this issue hasn't really progressed in years I think IPFS isn't a fit (sadly).

Clay-Ferguson commented 3 years ago

I read all the above comments but I'm still unsure what the final disposition of this issue was?

Here's my docker compose definition for IPFS, in case anyone familiar with docker has any input, or suggestions, and also in case it helps others:

 ipfs: 
        container_name: ipfs 
        environment:
            routing: "dhtclient" 
            IPFS_PROFILE: "server"
            IPFS_PATH: "/data/ipfs"
        volumes:
            - '${ipfs_staging}:/export'
            - '${ipfs_data}:/data/ipfs'
        ports:
            - "4001:4001"
            - "8080:8080"
            - "5001:5001"
        networks:
            - net-prod
        image: ipfs/go-ipfs:release

I'm shooting for the minimal viable low-bandwidth use case configuration with no swarms, just a single instance of everything. The above config seems to work just fine, but I'm unsure if it's using the least possible bandwidth, or not.

Stebalien commented 3 years ago

Enabling the "lowpower" profile should help. That will disable background data reproviding, set really low connection manager limits, and put your node into dhtclient mode.

bAndie91 commented 3 years ago

regarding to bandwidth limitation, have you considered limiting it externally (OS-level) ? it'd need ipfs to mark connections according to what kind of traffic does it constitute (dht, bitswap, meta/data transfer, etc) , so traffic could be controlled by eg. tc under Linux. it'd limit bw adaptively, so unlike trickle it uses spare bw.

see this idea here: https://discuss.ipfs.io/t/limiting-bandwidth-at-os-level/9102