ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.84k stars 2.97k forks source link

Writeup of router kill issue #3320

Closed whyrusleeping closed 11 months ago

whyrusleeping commented 7 years ago

So we know that ipfs can kill people routers. We should do a quick write up of what the causes are, which routers are normally affected, and maybe propose a couple ideas for solutions.

@Kubuxu do you think you could handle doing this at some point?

alexfisher commented 3 years ago

So, I revisit this issue after trying to run an IPFS node every 6 months only to find my router/internet connection killed by it over and over again. I'm actually open to replaying my ISP's provided cable modem with something that IPFS doesn't break. Does anyone have a list of cable modems known to work well?

kakra commented 3 years ago

Chances are that you could ask your provider to switch the modem into bridge mode for attaching your own router. Bridge mode (the one which directly attaches you to a public IP on the LAN interface) will probably turn off a lot of stuff in the cable modem and thus reduce CPU and memory pressure. Then use a good router. Also, upgrading to a DOCSIS 3.1 capable modem may improve things, as those should be designed to support much higher cable speeds and more connections, but memory and NAT settings may still be an issue. Your best bet is probably going for a dedicated router and using the modem in bridge mode. I'm using a DOCSIS 3.0/3.1 modem with 1 Gbit/s downstream in bridge mode, connected to a Unifi UDM as a router. But I didn't have much problems since my provider swapped the modem for a DOCSIS 3.1 one, still switching to bridge mode improved things (latency went down from 40-50 to 20-30ms).

Also, you may want to make sure that you're blocking traffic to private networks from being routed to the WAN interface of the modem. Most ISP routers will try to route such traffic to external via the WAN interface because they don't blackhole private networks and just use the default route, thereby hogging the NAT tables of the router. You could blackhole such networks in either your dedicated router or your local machine (example for Linux):

ip route add blackhole 192.168.0.0/16
ip route add blackhole 172.16.0.0/12
ip route add blackhole 10.0.0.0/8

No fear, your directly attached network will still work as it has a longer network prefix usually (if it doesn't, don't blackhole it). Also, other networks like VPN tunnels should still work because they put routes with longer prefixes. IPv6 should have no such problems as routes have scopes there which restrict the routing decision.

BTW: On my UDM I've put firewall rules to answer such packets with ICMP unreachable instead, so the connection gets properly torn down again immediately.

alexfisher commented 3 years ago

Thank you for the thorough and helpful suggestion @kakra ! I just put in an order for a Unifi UDM and some other goodies. I had already been looking at a Peplink router for load balancing and more processing power, but looks like the UDM will do a lot more for me. Will report back how IPFS works for me after getting things setup.

kakra commented 3 years ago

You should use the UDM with a modem in bridge mode, I hope your provider allows that or you can directly switch it in the modem. Otherwise the modem will still do NAT, and your problem may not be solved.

yxuco commented 3 years ago

I hope that this problem will be fixed, or instructions on required modem types and ISPs will be documented. I ran Ethereum node without any problem, but IPFS will kill my cable modem/router within 20 minutes. I understand that Ethereum uses different p2p libs, but it indicates that such problem is not only an issue of the modem configuration.

alexfisher commented 3 years ago

@kakra Yep, I can switch the Arris modem into Bridge mode no problem. Thanks again.

@yxuco Ethereum currently uses devp2p, but Ethereum 2.0 will switch to using libp2p. I experienced this same issue running an early Prysm Eth 2.0 testnet client which also uses libp2p.

kakra commented 3 years ago

I understand that Ethereum uses different p2p libs, but it indicates that such problem is not only an issue of the modem configuration.

From what I understand, IPFS announces also LAN IPs for finding and connecting hosts locally. It cannot know if such IPs are behind a NAT router, they may just be attached locally in a different network segment via a local router. So it's just fair that it works this way. The problem comes from most NAT routers that try to route the traffic to their default gateway no matter if that IP belongs to a private range. This create a NAT table entry, and because there will be no reply, the connection eventually times out after 10 minutes or so - until then, a useless NAT entry hogs the table in the router, and probably not only one but hundreds of them. The result: The modem struggles to map any more valid connections, DNS dies because it gets no replies, new connections fail and time out, the modem seems to be killed and eventually recovers, or it experienced some memory/CPU overflow and has to be rebooted.

Since using a good router, I have no more problems. IPFS currently maintains 350-400 peer connections and I see absolutely no negative effect on other connections or throughput.

It's probably dumb in almost any situation to let a NAT router map connections on the egress WAN interface in the NAT table which have a private IP as destination. This only tells us that ISPs don't put much effort into making a stable router, they just use the default settings, make some pretty (or not so pretty) UI change, and lock down the advanced settings... :-(

I don't think that IPFS would kill any somewhat modern router with a somewhat modern broadband line if ISPs would pre-configure their routers in a sane way. But evidence shows otherwise - and then it's most probably a bad pre-configuration that is locked down so users cannot adjust it. Not IPFS' fault...

alexfisher commented 3 years ago

@kakra UniFi UDM Pro is amazing. Loving my new network set up. Can confirm, IPFS and network running great now. Arris cable modem in bridge mode + UDM Pro.

kakra commented 3 years ago

@alexfisher I'm using the Non-Pro UDM - works also great. And my cable modem is probably a rebranded Arris modem - I'm not sure. At least the OUI matches Arris.

lkdmid commented 3 years ago

10th Nov 2020 - I thought I was going crazy until I came across this 2016 thread - but I can confirm that IPFS (on my Macbook, latest everything) causes my Sky Q router to spontaneously reboot, which is rather unfortunate given this ISP's popularity in the UK at-least. Unfortunately, I'm not in the position to replace the hardware right now either (and that's not really a suitable "solution" if mass-adoption is the goal anyways, right?).

kakra commented 3 years ago

Try black-holing LAN networks on your machines running IPFS:

# http://www.uni-koeln.de/~pbogusze/posts/Blackhole_routing_with_linux.html
sudo ip route add blackhole 192.168.0.0/16
sudo ip route add blackhole 172.16.0.0/12
sudo ip route add blackhole 10.0.0.0/8

Any more specific routes (those with a larger mask, e.g. 192.168.1.0/24) have a higher weight, so you should be fine setting these routes without any unexpected issues. Now, IPFS won't be able to send any of those destinations to your router which is likely the problem here because it forwards those via the WAN interface, despite not being routable there anyways. It will only bog the NAT tables of the router, in turn making it unstable.

A more advanced option would be to actually reject those destinations at the router itself so IPFS will be notified about unreachable destinations immediately, otherwise you're just shifting the problem over to your PC - but at least your PC is able to handle this.

IPv6 should be safe because it uses scoped routing: The router (at least a sane implementation) won't route site and local scope addresses to the WAN interface. And it usually doesn't need NAT either.

Mithrandir2k18 commented 3 years ago

Having the same issue right now. Tried setting highwater to 10, lowwater to 5 and grace period to 20s. Still crashes my router. It doesn't produce any logs sadly, and my raspi seems to handle the load quite fine. I logged out the active peers and during the crash I had about 160 active connections.

Loosetooth commented 3 years ago

Possible solution for old hardware

Use the js-ipfs node, it seems to only connect to ~20 peers, which was barely OK for my old router/modem combo to handle.

TLDR

We were having the same issue here in our local network. From the moment we would have a go-ipfs node running in the network, browsing became virtually impossible. (Due to slow DNS queries I presume.) Pages on IPFS would also load slowly, and sometimes not load at all.

The failing hardware was the Docsis 3.0 router/modem combination provided by our ISP.

Things I tried (in vain) with the old hardware:

Only when using the js-ipfs node would the network keep running. (Although it seemed to be close, when running multiple nodes at once in the network, it could still get slow.) This can be a solution for people who are unable to upgrade their modem/router hardware at the moment. It worked better for me at least.

After receiving a newer "modem-only" version from our ISP the problems disappeared immediately, being able to run any number of go-ipfs nodes without having any effect on the network. (Or requiring any special settings.) This is a Docsis 3.1 modem/router without WiFi capabilities.

chevdor commented 3 years ago

Hello,

I keep on seeing more and more people having the same issue. I am not bringing a real solution but rather a workaround that has been working quite well for me. I am aware that it is not an ideal solution but it may come as a nice addition and remove some of the current pain.

tldr;

let a cloud provider deal with those issues

details

Since the issue seems to be due to our poor little routers, I am now running my ipfs node on a VPS and using an ssh tunnel whenever I need my ipfs node (apparently) locally. It works very well and it has the benefit that my node runs 24/7. Network latency is not really an issue (especially not compared to the situation where your network just loses it...).

Baaleos commented 3 years ago

Has there been any solution to this yet? Im using BT (in the uk) for ISP - They provided me with a router (Smart Hub 2) which provides the internet access, and I have another one that provides the wifi / network access

BT Specs here: https://en.wikipedia.org/wiki/BT_Smart_Hub https://www.amazon.co.uk/ASUS-Rapture-GT-AC5300-Tri-band-Gigabit/dp/B074Z5JTLT

I had go-ipfs running on a raspberry pi linux setup - which was connecting via wifi to the ROG router.

My Desktop PC has wired access via powerline networking directly to the BT router (direct internet). My phone is connected to the ROG router - which is then wire connected into the BT router - for internet access.

When the IPFS daemon is running - my Desktop PC loses the ability to access the internet as well as local internal sites (such as the routers management page at 192.168.1.254 etc)

However- my phone, which is connected to ROG router - somehow still has internet access. I also noticed that the led on the powerline networking devices also turns red- this is likely just a side effect of losing internet access.

I'd like to be able to host an IPFS node in my home - but this currently makes that an impossibility, unless I want to go without internet access on my desktop.

I had a look at my BT router logs - here are the relevant segments that reference the ipfs host as well as the events that were recorded during one of the 'outages'

bt1

Trying to get logs from the ROG router - but I just went and locked myself out of it for 5 minutes - password one of those hard to remember ones.

hsanjuan commented 3 years ago

go-ipfs now listens by default on tcp and udp (quic):

    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ],

It may alleviate the situation for some boxes to listen only on tcp.

UDP is stateless. Still routers/firewalls are probably keeping tabs of UDP sessions, which are only cleaned on timeout (TCP sessions go away when closed). It may be that udp connection-tracking entries accumulate in the routers and exhaust their memory, even when ipfs has "closed" those connections when the connection manager kicks in (the router does not know they are closed).

If you can test by disabling the quic listeners and report if there is any improvement, that would be great.

ttax00 commented 3 years ago

If you can test by disabling the quic listeners and report if there is any improvement, that would be great.

I am running a PopOS machine and host my ipfs-cli daemon on it while being wired through the internet, and other devices are connected through a wifi router. Sadly I don't have access to the routers for logs since they're owned by the landlord, but a quick nmap scan of the router IP resulted in (NAC platforms).

Things I've tried futilely:

My wifi devices function properly without being barred from the web, while my machine can't access the internet within a few minutes of daemon starting.

I don't know if my situation can be considered the same as @Baaleos but disabling QUIC seems to stop the internet from being unable to reach. There are still occasional hiccups, though, access is much smoother than having QUIC enabled.

Edit: After letting the daemon run over the past 12h.

I suspect if the issue is tracking entries the router might allocate a limit for each device and doesn't handle more unless they're closed, not until memory exhaustion.

Here's ipfs diag sys if that helps

{
    "diskinfo": {
        "free_space": 256048828416,
        "fstype": "61267",
        "total_space": 240827699200
    },
    "environment": {
        "GOPATH": "",
        "IPFS_PATH": "/home/tech/snap/ipfs/common"
    },
    "ipfs_commit": "ce693d7e8",
    "ipfs_version": "0.8.0",
    "memory": {
        "swap": 0,
        "virt": 2591756000
    },
    "net": {
        "interface_addresses": [
            "/ip4/127.0.0.1",
            "/ip4/172.16.80.2",
            "/ip6/::1",
            "/ip6/fe80::7410:944d:b8d0:955d"
        ],
        "online": true
    },
    "runtime": {
        "arch": "amd64",
        "compiler": "gc",
        "gomaxprocs": 12,
        "numcpu": 12,
        "numgoroutines": 3801,
        "os": "linux",
        "version": "go1.14.15"
    }
}
hsanjuan commented 3 years ago

I don't know if my situation can be considered the same as @Baaleos but disabling QUIC seems to stop the internet from being unable to reach. There are still occasional hiccups, though, access is much smoother than having QUIC enabled.

So you are saying that the problem is mostly solved disabling QUIC and having very conservative connection manager settings?

ttax00 commented 3 years ago

So you are saying that the problem is mostly solved disabling QUIC and having very conservative connection manager settings?

After I did some benchmarking, I'd say that disabling QUIC makes the latency spikes less website-breaking than with it enabled. There's a bit of improvement when combined with conservative connection manager settings, but it seems negligible compared to QUIC enabled.

QUIC enabled (peer count average ~300): 2021-03-01_23-16

QUIC disabled (peer count average ~800): 2021-03-01_23-39

QUIC disabled (peer count average ~300, conservative connection manager settings): 2021-03-01_23-57

Daemon off: 2021-03-01_23-47

waltercool commented 3 years ago

Just giving personal experience here on daemon mode.

Using by default (QUIC Enabled): After few minutes my server began to enqueue DNS requests (systemd-resolved) and the whole system gets unresponsive to resolve DNS. Removing QUIC settings: My server works normal for several minutes, then my router begins to hang constantly. Removing QUIC settings + Only IPV6: Same as before.

So, not sure if just my router is cheap, if my ISP is blocking requests, or something else may be impacting. In case of being a router issue, I have no clue what technical specification of my router may cause that.

ttax00 commented 3 years ago

I wonder if this issue also happens even when you port-forwarded the ipfs node properly? It could be the case that routers are trying to be clever with all the connections as the host is behind a firewall which takes up resources. I remember back in my family house the last year, manually opens a port with both zero-net & ipfs node running with no issue. While now, behind a NAT in a ~70 people apartment, yet only the host machine has the connection problem but no one else when ipfs is online.

Edit: And disabling QUIC just slows down the filling up on the sessions, and since UDP is stateless, there'll be a huge latency spike while the router waits for sessions to expire.

RubenKelevra commented 3 years ago

I can confirm issues with a Huawei router (B528s-23a) as well. I've deactivated all firewall functionalities but it seems still to block new connections when there are too many open connections. Stopping IPFS immediately fixes any connection issues.

I've contacted Huawei support on this and they started arguing that my internet connection is the issue. I'm now waiting for a Level 2 support response.

But in this case the router won't restart. It will just block any new connection to be established, something like rate limiting is going on here. Already established connections will still work flawlessly.

RubenKelevra commented 3 years ago

@hsanjuan wrote:

If you can test by disabling the quic listeners and report if there is any improvement, that would be great.

This only disables incoming connections. To disable outgoing connections one needs to run:

ipfs config --json Swarm.Transports.Network.QUIC false
RubenKelevra commented 3 years ago

In case of UDP/TCP blockage you can check your connection with mtr:

mtr odin.pacman.store -T

This shows how many packages will get dropped on each hop when you send a TCP SYN.

mtr odin.pacman.store -u

This shows how many packages will get dropped on each hop when you send a UDP package.

Since it sends out one package per second per each hop it can also trigger the rate limit, after like 15 seconds, for both TCP and UDP. But it's starting with 0% loss when IPFS is not running.

When IPFS runs (with only QUIC deactivated) I get like 90% packet loss on new connection attempts. GREAT!

Screenshot_20210302_204739 mtr odin.pacman.store -T

Screenshot_20210302_204832 mtr odin.pacman.store -u

ttax00 commented 3 years ago

@RubenKelevra :

In case of UDP/TCP blockage you can check your connection with mtr:

mtr odin.pacman.store -T

This shows how many packages will get dropped on each hop when you send a TCP SYN.

mtr odin.pacman.store -u

So it still boils down to the connection limit then? The packet losses still pile up to 70-90% for me after running about a minute of mtr even when IPFS is off 2021-03-05_12-09

But it's weird how some router handles it with ease while others don't, my apartment is facilitating ~120 people with their devices, study at home through zoom/streaming every day, and yet my node can't reach more than 900 peers before losing connections.

RubenKelevra commented 3 years ago

Yeah, it looks like it boils down to a connection limit. But maybe like a rate limit of say 3 new connections per second after 200 connections has been established.

Not a hard limit.

Those plastic routers usually can handle a hard limit of 16k at least of connections.

RubenKelevra commented 3 years ago

I'm now on roundtrip 5 with Huawei support. After they recommended me apart from resetting my device, to buy exactly the device I have issues with.

Pretty disappointing quality of support TBH.

I now shared with the 3rd level support, that mtr is showing issues after a short while, which is obviously not happening on a smaller router, nor when tethering with my phone. So it's 100% the fault of this specific device.

RubenKelevra commented 3 years ago

Btw: Should we create a list of known bad/known good devices somewhere? I think that could be beneficial to track the process of contacting the vendors to fix them, or show that their software might be EOL so no update could be expected etc.

chevdor commented 3 years ago

I contacted AMV (https://avm.de/produkte/fritzbox/) a while back and the issue is in the hand of the Marketing as it sounds. In other words, they will care when a good share of the users will complain... short version: they don't care.

kakra commented 3 years ago

As always with AVM... They are mostly only provider-centric these days because that's their main distribution channel and there's hardly any competitor on the market they serve. They removed bridge mode (for some pseudo-argument to guarantee proper functioning of provider features like SIP and IP-TV), there's a bug in the DHCP server which may add multiple MAC addresses for the same IP (and breaking NAT, DNAT and exposed host by that), there's a bug in the family protection functions (which shift destination ports for inbound and source ports for outbound connections without reflecting that in the GUI which renders it completely useless). This is all known but won't be fixed. This is purely only a consumer router since a few firmware versions with no proper support for firewalls or SIP gateways behind it (at least it supports static routes so you can disable double NAT), not usable in professional environments (still that's the only routers providers in Germany supply and support even for business-grade access accounts). The latest models brick more often during firmware updates, and updates often break NAT functionality of existing and working configurations.

We moved over to using Draytek DSL modems instead - directly attached to the firewalls. Works much better.

RubenKelevra commented 3 years ago

@chevdor wrote:

I contacted AMV (https://avm.de/produkte/fritzbox/) a while back and the issue is in the hand of the Marketing as it sounds. In other words, they will care when a good share of the users will complain... short version: they don't care.

How about contacting heise.de about this or the CCC to get more leverage?

g33kme commented 3 years ago

Running latest DappNode crashes my Fritz!Box 6490 (Cable). Any idea for a workaround?

Mithrandir2k18 commented 3 years ago

If I were to get a new router, how would I find out if it will be encumbered by this issue? Are there some specs to look for? Or is there a list of "good" routers? Thinking about setting my ISPs Huawei to bridge mode and use a different one as router. Would that help? Or will it still crash old routers set to bridge mode? Looking for an affordable 802.11ax device that can handle multiple IPFS nodes in the network(daemon on a raspi for pinning files and nodes used with the brave browser on my latop/pc).

kakra commented 3 years ago

Bridge mode and Ubiquity UDM works fine for me.

RubenKelevra commented 2 years ago

I'm now on roundtrip 5 with Huawei support. After they recommended me apart from resetting my device, to buy exactly the device I have issues with.

Pretty disappointing quality of support TBH.

I now shared with the 3rd level support, that mtr is showing issues after a short while, which is obviously not happening on a smaller router, nor when tethering with my phone. So it's 100% the fault of this specific device.

Huawei's support hasn't made any further attempts to fix my issue.

Quite disappointing.

Anyway @Stebalien and @hsanjuan what do you think about creating a wiki-list of routers which are known to work/not work to track the progress of changes in ipfs and fixes in their firmware? 🤔

RubenKelevra commented 2 years ago

A general workaround is of course to use an VPN.

Cloudflares WARP for example works and even offers an IPv6 and is extremely easy to setup.

Just in case someone needs a quick workaround for this situation.

RubenKelevra commented 2 years ago

@Mithrandir2k18 wrote:

If I were to get a new router, how would I find out if it will be encumbered by this issue? Are there some specs to look for? Or is there a list of "good" routers? Thinking about setting my ISPs Huawei to bridge mode and use a different one as router. Would that help? Or will it still crash old routers set to bridge mode? Looking for an affordable 802.11ax device that can handle multiple IPFS nodes in the network(daemon on a raspi for pinning files and nodes used with the brave browser on my latop/pc).

Well, the issue is basically the device which makes NAT and does your IPv6 routing.

So you can pickup basically any WiFi-AP as long as they work in bridge mode.

Regarding a router I would recommend to use hardware which supports OpenWRT. It works reliably and you can change any setting you like.

I personally use a TP-Link C2600. The hardware is end-of-life, so you can pick it up for cheap, while it got half a gigabyte of memory and a dual core processor.

It does beam forming and multi-device MIMO.

yxuco commented 2 years ago

I retried it a year after I had this issue, and it all works now with the same internet service provider (Comcast) and same old cable modem (Linksys CG7500). It is not a problem of the old hardware after all. The only things I have done differently are (1) I built it from the latest source code, (2) I am using the server profile, i.e., ipfs init --profile server, and (3) I removed the lines of quic under Addresses.Swarm in ~/.ipfs/config.

markg85 commented 2 years ago

I apparently was lucky in the past to run not into this issue. Now, with a different ISP, I am running in this issue very often. My router reboots probably once every half hour or so.

I fust followed the advice here of disabling QUIC ipfs config --json Swarm.Transports.Network.QUIC false And removing the QUIC lines from the listener array.

I hope this ends up being a workable solutions.

We should implement a max connections but high/low water are really designed to be target bounds.

The libp2p team is currently refactoring the "dialer" system in a way that'll make it easy for us to configure a maximum number of outbound connections. Unfortunately, there's really nothing we can do about inbound connections except kill them as soon as we can. On the other hand, having too many connections usually comes from dialing.

Quoting this part from @Stebalien (up here in this thread from 2018). I'm curious if there's any update on this front?

Also, besides the a potential "max_connections" setting, the speed at which connections are made might need some rethinking too. It "looks" like IPFS currently gets a list of peers it can connect to and tries to connect to all of them at once (i assume in some huge thread pool). This kind of rapid connection making is likely why hosts quite often flag IPFS as a ddos attack. I just had a mail about that from my host asking me to confirm that my node was behaving normally as it looked like a ddos attack to them. Perhaps it's an idea to do the connection attempts in some pooling way. To not connect to all but connect in batches of a set limit (say 50 or so)?

(note before commenting, it actually crashed while writing this comment...)

kwinz commented 2 years ago

I have a fairly beefy 4 core x64 NAT / router with OpenWRT and fq_codel smart queues that usually never gives me any problems. I have low ping even if I have several different clients doing various transfers with hundreds of TCP connections at the same time - except for IPFS!

I already set ipfs config --json Swarm.ConnMgr '{"HighWater": 100, "LowWater": 32, "Type": "basic", "GracePeriod": "15s"}', removed any lines containing quic from Addresses.Swarm and set ipfs config --json Swarm.Transports.Network.QUIC false and can confirm I have no QUIC connections any more.

But a single IPFS client in the network absolutely obliterates my 500/50 Mbit/s internet connectivity. Websites will take 10-20 seconds to just load the initial 100kB HTML file on any network connected computer. OpenWRT is showing ~5500 of 16000 open connections and not much CPU utilization. I am actually surprised how effective IPFS is at DDOSing, I didn't even think it was possible to kill my connectivity this efficiently even with the QOS smart queues!

Any advice?

Winterhuman commented 2 years ago

@kwinz Could you give us numbers on the RAM usage as well? It's the only other detail your post seems to be missing that I think could be a cause.

kakra commented 2 years ago

I have a fairly beefy 4 core x64 NAT / router with OpenWRT

It won't help if your ISP modem still works in router mode. If you put your ISP modem into bridge mode and handle the routing exclusively in the OpenWRT router, it should work just fine. Smart queues put a lot of pressure on the router CPU, it may work better without it. I'm running on a UDM router with 1000D/50U connection (4-core ARM CPU), and enabling smart queues absolutely kills its performance (only 600-700 mbps downstream, and IPFS can overwhelm it), but without smart queues it runs just fine. My connection has enough bandwidth that smart queues do not actually matter, you probably only need it if you constantly fully saturate your uplink bandwidth. Downstream cannot be properly controlled with smart queues anyways. If you still want to use smart queues, measure your bandwidth up and down without IPFS running, then set the smart queue bandwidth to 80% of what's available so it leaves enough headroom for packet bursts (which IPFS actually creates a lot of), at any price you want to prevent packets queuing up in the modem uplink because that's what increases the packet latency and your network has no control over priorities in that uplink queue. A high uplink latency will decrease your usable downlink bandwidth significantly due to how TCP works (ACK packets and receive window sizes).

chevdor commented 2 years ago

I can backup the statements from @kakra and thank him from pointing out the smart queues options.

I also switched from using my (various Fritzbox, lately 6660) router into a simple bridge and upgraded to an UDMPro. Smart Queues do reduce my bandwidth significantly by some 45%. Since I removed the Fritzbox router from the equation, I see the (real) router does work but at least it no longer kills my connection (ie the router keeps doing its job and does not collapse like Fritzboxes do).

hsanjuan commented 2 years ago

If you put your ISP modem into bridge mode

My ISP modem (Vodafone) is in bridge mode and it implodes (restarts itself and is unable to restore internet) whenever IPFS becomes a little active in terms of connections.

kakra commented 2 years ago

My ISP modem (Vodafone) is in bridge mode and it implodes

Vodafone here, too, with 1000D/50U, so it's the more modern modem. Swarm configuration:

  "Swarm": {
    "AddrFilters": [],
    "ConnMgr": {
      "GracePeriod": "60s",
      "HighWater": 200,
      "LowWater": 150,
      "Type": "basic"
    }
kwinz commented 2 years ago

@LynHyper The RAM usage is a fraction of the 4GB that the router has. OpenWRT is pretty conservative on the RAM, since it's made for embedded. Maybe I need to configure it so it uses more of its RAM?

@kakra No, the modem is already in bridge mode, no NAT/routing is going on in the modem.

I guess I have two problems at hand here: 1. why does the router even allow itself to be DDOSed? That's probably more of a question for an OpenWRT forum and not for here.

And 2. Why is IPFS DDOSing my connection? And if my pretty beefy router with an up to date OS lets itself be DDOSed then probably most home users will be also affected and will have a bad experience with IPFS. IPFS has to work without depending on users making changes to their router config. Most people wouldn't even know how to do that. So can we have IPFS use more sensible settings and or defaults? That's the question for this thread here. So far I still don't know how to set up IPFS so this doesn't happen any more. Who can please help me there?

kakra commented 2 years ago

Why is IPFS DDOSing my connection?

Well, one more idea: IPFS probably tries to reach other LAN addresses via your default gateway (aka OpenWRT), and that in turn just routes everything by default out to WAN that's not destined for a directly attached local network. Of course, it will NAT those addresses.

Now comes the problem: Those private network destinations routed to your WAN will always timeout, they never get replies. That means, the NAT table will be filled with useless entries that eventually timeout after 10 minutes or maybe even more because they never get a reply which may discard the entry early.

Countermeasures:

  1. Blackhole all private network prefixes in the OpenWRT routing table: ip route add blackhole 192.168.0.0/16; ip route add blackhole 10.0.0.0/8; ip route add blackhole 172.16.0.0/12. This ensures a private destination will never go to your WAN interface where it would be NAT'ed (locally attached LANs will still be routed because of longer prefixes). That's generally a good idea, not only for IPFS, other software may try to reach random LAN IPs and bodge your router NAT tables. ISP consumer routers often have absolutely no idea what they are routing and push just everything out the WAN interface. I've tested this for multiple routers with tcpdump, and they all routed LAN destinations to the WAN interface which makes no sense in the environment they target. It may be beneficial to actively reject those destinations via firewall rules instead so it would generate an immediate "rejected" response to your software and it can continue with the next host.
  2. You may also want to blackhole 100.64.0.0/10 because that is ISP CGNAT and usually does not allow port forwardings. But YMMV. It would also block you from connecting with their outgoing connections. It may be better to reject outgoing connection initialization via a firewall rule.
  3. For your Vodafone Cable modem, you might actually exclude one route from blackholing to still reach its web UI on the secret "WAN routable" IP: ip route add 192.168.100.1/32 dev YOURWAN. It needs to be NAT'ed to actually work (so don't exclude private networks from NAT, that's why I suggest blackhole routes instead).
  4. Reduce the NAT table entry timeout in your router, not sure how to do that on OpenWRT. This ensures that table entries without activity are discarded early.

This won't affect the bridge mode of the modem, tho. Not sure what the problem is here.

I'm using Unifi UDM with firewall rules to reject LAN destinations to the WAN interface, with the exception of 192.168.100.1 for the Vodafone cable modem (didn't find a way to deploy blackhole routes).

kwinz commented 2 years ago

@kakra Thanks, I will try that.

I would suggest to change IPFS so that by default it doesn't even try to reach those not globally routed private LAN addresses, and that I manually have to whitelist the subnets that I am using in my local LAN if they are not link connected. Or better yet allow them but don't accept unroutable private IP space destinations from peers off the internet. That doesn't make any sense. https://en.wikipedia.org/wiki/Private_network#Private_IPv4_addresses Or that it recognizes that those always fail and backs off trying them. I think that would greatly help adoption. We can't expect all users to know how to change the routing table of their router. I don't have this problem with any other file sharing or QUIC software.

kakra commented 2 years ago

We can't expect all users to know how to change the routing table of their router

I think that is something, ISPs should actually fix in their routers and ship with sensible defaults - maybe add a button "I know what I'm doing" if you want to route private networks via the WAN interface (there's exist setups where it makes sense but the common situation is: it does not).

kwinz commented 2 years ago

ISPs should actually fix in their routers and ship with sensible defaults

And IPv4 has a beautiful protocol field, with dozens of protocols defined that it is supposed to be able to carry. But in practice only ICMP, TCP, and UDP get reliably forwarded by the routers. So what do the new protocols like SCTP, QUIC, etc. do? Do they wait for the ISPs of the world to exchange all their routers? No they encapsulate their new protocols in UDP. You can't stay idealistic waiting for the world to change. Let's stay pragmatic and ship defaults that actually work for our users.

And I would argue accepting unroutable private IP space destinations from peers off the internet doesn't even make sense. Maybe in some "carrier grade NAT" deployments that I haven't thought about.