ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.83k stars 2.96k forks source link

cannot reserve inbound connection: resource limit exceeded #9432

Closed anarkrypto closed 1 year ago

anarkrypto commented 1 year ago

Checklist

Installation method

built from source

Version

Kubo version: 0.17.0-4485d6b
Repo version: 12
System version: amd64/linux
Golang version: go1.19.1

Config

{
  "API": {
    "HTTPHeaders": {
      "Access-Control-Allow-Origin": [
        "*"
      ]
    }
  },
  "Addresses": {
    "API": "/ip4/0.0.0.0/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "12D3KooWMywfzmLWCWErc9L7CmfLLFbmoSHroHBUveUPaarDbAfF"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Methods": null,
    "Routers": null,
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

Trying to run IPFS on VPS.

Error Description: The node starts, show success swarm announcing addresses, the port 4001 is exposed externally ( I checked) and after a few seconds it closes and I got this error message:

ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 496 times with error "system: cannot reserve inbound connection: resource limit exceeded".

Then the port cannot be reached anymore

Error occurend on both Docker (20.10.21) and installation from binaries

image

anarkrypto commented 1 year ago

This does not happen when running ipfs daemon from snap

Kubo version: 0.16.0-38117db6f Repo version: 12 System version: amd64/linux Golang version: go1.19

mitchds commented 1 year ago

This does not happen when running ipfs daemon from snap

Kubo version: 0.16.0-38117db6f Repo version: 12 System version: amd64/linux Golang version: go1.19

As mentioned this happens with kubo 0.17. I am facing the same issue. This is not an issue with snap or not but with the new libp2p code i guess which was turned on in 0.17. I have gone back to 0.16 for the moment.

I have tried to increase the inbound connection limits to no avail

mitchds commented 1 year ago

I accidentally changed the resource in the wrong server, so changing the inbound connection value does work.

So changing the inbound connections to 1024 cured this for me. So add to your .ipfs/config in the "Swarm" block the following then you can tweak as you need.

 "ResourceMgr": {
      "Limits": {
        "System": {
          "Memory": 1073741824,
          "FD": 512,
          "Conns": 1024,
          "ConnsInbound": 1024,
          "ConnsOutbound": 1024,
          "Streams": 16384,
          "StreamsInbound": 4096,
          "StreamsOutbound": 16384
        }
      }
    },
dennis-tra commented 1 year ago

Also ran into this problem. My error message is:

Application error 0x0: conn-12133298: system: cannot reserve inbound connection: resource limit exceeded

We are running a customized 0.17.0 build.

ipfs config show ``` { "API": { "HTTPHeaders": {} }, "Addresses": { "API": "/ip4/127.0.0.1/tcp/5001", "Announce": [], "AppendAnnounce": [], "Gateway": "/ip4/127.0.0.1/tcp/8080", "NoAnnounce": [], "Swarm": [ "/ip4/0.0.0.0/tcp/4001", "/ip6/::/tcp/4001", "/ip4/0.0.0.0/udp/4001/quic", "/ip6/::/udp/4001/quic" ] }, "AutoNAT": {}, "Bootstrap": [ "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt", "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ", "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ", "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN", "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa", "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb" ], "DNS": { "Resolvers": {} }, "Datastore": { "BloomFilterSize": 0, "GCPeriod": "1h", "HashOnRead": false, "Spec": { "mounts": [ { "child": { "path": "blocks", "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2", "sync": true, "type": "flatfs" }, "mountpoint": "/blocks", "prefix": "flatfs.datastore", "type": "measure" }, { "child": { "compression": "none", "path": "datastore", "type": "levelds" }, "mountpoint": "/", "prefix": "leveldb.datastore", "type": "measure" } ], "type": "mount" }, "StorageGCWatermark": 90, "StorageMax": "10GB" }, "Discovery": { "MDNS": { "Enabled": true } }, "Experimental": { "AcceleratedDHTClient": false, "FilestoreEnabled": false, "GraphsyncEnabled": false, "Libp2pStreamMounting": false, "P2pHttpProxy": false, "StrategicProviding": false, "UrlstoreEnabled": false }, "Gateway": { "APICommands": [], "HTTPHeaders": { "Access-Control-Allow-Headers": [ "X-Requested-With", "Range", "User-Agent" ], "Access-Control-Allow-Methods": [ "GET" ], "Access-Control-Allow-Origin": [ "*" ] }, "NoDNSLink": false, "NoFetch": false, "PathPrefixes": [], "PublicGateways": null, "RootRedirect": "", "Writable": false }, "Identity": { "PeerID": "12D3KooWMUTo8FJp9Rm9rwYuCdcR6Xi6wRjBjm2eDaNPEwgKgFdW" }, "Internal": {}, "Ipns": { "RecordLifetime": "", "RepublishPeriod": "", "ResolveCacheSize": 128 }, "Migration": { "DownloadSources": [], "Keep": "" }, "Mounts": { "FuseAllowOther": false, "IPFS": "/ipfs", "IPNS": "/ipns" }, "Peering": { "Peers": null }, "Pinning": { "RemoteServices": {} }, "Plugins": { "Plugins": null }, "Provider": { "Strategy": "" }, "Pubsub": { "DisableSigning": false, "Router": "" }, "Reprovider": { "Interval": "12h", "Strategy": "all" }, "Routing": { "Methods": null, "Routers": null, "Type": "dht" }, "Swarm": { "AddrFilters": null, "ConnMgr": {}, "DisableBandwidthMetrics": false, "DisableNatPortMap": false, "RelayClient": {}, "RelayService": {}, "ResourceMgr": { "Enabled": false }, "Transports": { "Multiplexers": {}, "Network": {}, "Security": {} } } } ```

We are running an experiment to measure the lookup latencies in the IPFS DHT network. For that we have deployed several, customized kubo nodes. The customization consist of just additional log messages. One of these log messages is right after the GetProviders RPC here:

https://github.com/libp2p/go-libp2p-kad-dht/blob/9896ce5b196a4c262d489b35460056a4b4e5618f/routing.go#L536

It logs the return values. I noticed that I receive a lot of the following errors:

Application error 0x0: conn-12133298: system: cannot reserve inbound connection: resource limit exceeded

Therefore I went ahead and disabled the resource manager (see config above). The error messages still stick around. Then we just deployed more beefy machines and the errors seem to be less but they still happen frequently.

It's also weird that the error message talks about an inbound connection although I'm calling out to the remote peer 🤔 .

koxon commented 1 year ago

I see the same consistent issue.

2022-11-29T04:08:37.090Z ERROR resourcemanager libp2p/rcmgr_logging.go:53 Resource limits were exceeded 42 times with error "system: cannot reserve inbound connection: resource limit exceeded".


# ipfs --version
ipfs version 0.17.0
anarkrypto commented 1 year ago

I was able to fix by downgrading to v0.16.0

@BigLep @lidel @galargh @ajnavarro

ajnavarro commented 1 year ago

This error is expected when you have too many inbound connections at the System level to avoid DoS attacks. If your hardware or use case needs to support more inbound connections than the default, you can change that by doing:

# Remove custom params
ipfs config --json Swarm.ResourceMgr '{}'

# Set inbound connection limits to a custom value
ipfs config --json Swarm.ResourceMgr.Limits.System.ConnsInbound 1000

# You might want to change also the number of inbound streams
ipfs config --json Swarm.ResourceMgr.Limits.System.StreamsInbound 1000

# If your hardware configuration is able to handle more connections
# and you are hitting Transient limits, you can also change them:
ipfs config --json Swarm.ResourceMgr.Limits.Transient.ConnsInbound 1000
ipfs config --json Swarm.ResourceMgr.Limits.Transient.StreamsInbound 1000

# Remember to restart the node to apply the changes

# You can see the applied changes executing:
$ ipfs swarm limit system
$ ipfs swarm limit transient

# You can check actual resources in use:
$ ipfs swarm stats system
$ ipfs swarm stats transient

The error is followed by a link: Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr There you can learn about all the different knobs to tune ResourceManager, but the more important ones are ConnsInbound and StreamsInbound.

rotarur commented 1 year ago

I'm facing the same issue. looking at the stats and comparing with the limits I have it's not even touches the limits but still seeing this error in my logs.

/ # ipfs swarm stats system
{
  "System": {
    "Conns": 563,
    "ConnsInbound": 0,
    "ConnsOutbound": 563,
    "FD": 125,
    "Memory": 44040288,
    "Streams": 868,
    "StreamsInbound": 55,
    "StreamsOutbound": 813
  }
}
/ # ipfs swarm limit system
{
  "Conns": 1024,
  "ConnsInbound": 1024,
  "ConnsOutbound": 1024,
  "FD": 4512,
  "Memory": 1073741824,
  "Streams": 16384,
  "StreamsInbound": 4096,
  "StreamsOutbound": 16384
}

I'm running the IPFS v0.17.0

ajnavarro commented 1 year ago

@rotarur can you paste the error that you are having? Your node might be hitting another RM level limit, like transient.

kallisti5 commented 1 year ago

I'm running into this issue after upgrading to 0.17.0. Almost continuous in logs...

Nov 29 13:18:36 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:18:36.217-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 261 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Nov 29 13:18:36 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:18:36.218-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
Nov 29 13:18:46 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:18:46.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 342 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Nov 29 13:18:46 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:18:46.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
Nov 29 13:18:56 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:18:56.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 322 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Nov 29 13:18:56 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:18:56.217-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
Nov 29 13:19:06 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:06.215-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 396 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Nov 29 13:19:06 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:06.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
Nov 29 13:19:16 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:16.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 426 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Nov 29 13:19:16 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:16.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
Nov 29 13:19:26 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:26.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 437 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Nov 29 13:19:26 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:26.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
Nov 29 13:19:36 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:36.216-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 387 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Nov 29 13:19:36 ipfspri.discord.local ipfs[239746]: 2022-11-29T13:19:36.219-0600        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
kallisti5 commented 1 year ago
$ ipfs swarm limit system
{
  "Conns": 4611686018427388000,
  "ConnsInbound": 123,
  "ConnsOutbound": 4611686018427388000,
  "FD": 4096,
  "Memory": 1999292928,
  "Streams": 4611686018427388000,
  "StreamsInbound": 1977,
  "StreamsOutbound": 4611686018427388000
}
$ ipfs swarm limit transient
{
  "Conns": 4611686018427388000,
  "ConnsInbound": 46,
  "ConnsOutbound": 4611686018427388000,
  "FD": 1024,
  "Memory": 158466048,
  "Streams": 4611686018427388000,
  "StreamsInbound": 247,
  "StreamsOutbound": 4611686018427388000
}
$ ipfs swarm stats system
{
  "System": {
    "Conns": 213,
    "ConnsInbound": 123,
    "ConnsOutbound": 90,
    "FD": 38,
    "Memory": 5914624,
    "Streams": 197,
    "StreamsInbound": 80,
    "StreamsOutbound": 117
  }
}
$ ipfs swarm stats transient
{
  "Transient": {
    "Conns": 0,
    "ConnsInbound": 0,
    "ConnsOutbound": 0,
    "FD": 0,
    "Memory": 0,
    "Streams": 1,
    "StreamsInbound": 0,
    "StreamsOutbound": 1
  }
}
kallisti5 commented 1 year ago

soo. it looks like when the ResourceMgr limits are undefined ipfs config --json Swarm.ResourceMgr '{}' random limits get set?

kallisti5 commented 1 year ago

Hm. I set defined limits for all the "random values", and still seeing random values after restarting IPFS. It looks like maybe some memory overflow...

config:

    "ResourceMgr": {
      "Limits": {
        "System": {
          "Conns": 2048,
          "ConnsInbound": 1024,
          "ConnsOutbound": 1024,
          "FD:": 8192,
          "Streams:": 16384,
          "StreamsInbound:": 4096,
          "StreamsOutbound:": 16384
        }
      }
    },
$ ipfs swarm limit system
{
  "Conns": 2048,
  "ConnsInbound": 1024,
  "ConnsOutbound": 1024,
  "FD": 4096,
  "Memory": 1999292928,
  "Streams": 4611686018427388000,
  "StreamsInbound": 1977,
  "StreamsOutbound": 4611686018427388000
}
2color commented 1 year ago

I also seem to be experiencing this even though I have the resource manager disabled

ajnavarro commented 1 year ago

@kallisti5 please check your configuration. It is wrong. Remove : from the variable name. Also, it is not a memory overflow, it is the max value (like not having limits)

ajnavarro commented 1 year ago

@2color how did you disable RM? ipfs config --json Swarm.ResourceMgr.Enabled false and restarting the daemon?

rotarur commented 1 year ago

@ajnavarro

my logs are always the same and I don't have the documentation link, weird

ipfs 2022-11-30T11:37:52.039Z    INFO    net/identify    identify/id.go:369    failed negotiate identify protocol with peer    {"peer": "12D3KooWMTa2XzV7thiUSKVKUfUYtBGiV7T3fGjayy7voHVKbjAF", "error": "Application error 0x0: conn-3607345: system: cannot reserve inbound connection: resource limit exceeded"}
ipfs 2022-11-30T11:37:52.039Z    WARN    net/identify    identify/id.go:334    failed to identify 12D3KooWMTa2XzV7thiUSKVKUfUYtBGiV7T3fGjayy7voHVKbjAF: Application error 0x0: conn-3607345: system: cannot reserve inbound connection: resource limit exceeded

The transient connections are not used

/ # ipfs swarm limit transient
{
  "Conns": 4611686018427388000,
  "ConnsInbound": 1024,
  "ConnsOutbound": 1024,
  "FD": 131072,
  "Memory": 521011200,
  "Streams": 4611686018427388000,
  "StreamsInbound": 592,
  "StreamsOutbound": 4611686018427388000
}
/ # ipfs swarm stats transient
{
  "Transient": {
    "Conns": 0,
    "ConnsInbound": 0,
    "ConnsOutbound": 0,
    "FD": 0,
    "Memory": 0,
    "Streams": 1,
    "StreamsInbound": 0,
    "StreamsOutbound": 1
  }
}

My server is big enough for the IPFS and it has plenty of resources to use

ajnavarro commented 1 year ago

@rotarur are you getting errors like Resource limits were exceeded 261 times with error...? Can you paste them here to see the RM level we are hitting?

If there are no errors like these, it is a different problem.

kallisti5 commented 1 year ago

@ajnavarro LOL. I think you just found the issue.

ipfs config --json Swarm.ResourceMgr.Limits.System.FD: 8192

That's the command I used to set FD Isn't FD a reserved var in golang?

EDIT: Nevermind. I just realized the syntax is indeed ipfs config ... without the : So, it looks like a little validation needs to happen here, and ipfs can't handle an empty or invalid ResourceMgr limits?

rotarur commented 1 year ago

@ajnavarro I don't have any error like Resource limits were exceeded 261 times with error...

fanhai commented 1 year ago

Can you configure the number of connections according to the protocol priority /p2p/id/delta/1.0.0 /ipfs/id/1.0.0 /ipfs/id/push/1.0.0 /ipfs/ping/1.0.0 /libp2p/circuit/relay/0.1.0 /libp2p/circuit/relay/0.2.0/stop /ipfs/lan/kad/1.0.0 /libp2p/autonat/1.0.0 /ipfs/bitswap/1.2.0 /ipfs/bitswap/1.1.0 /ipfs/bitswap/1.0.0 /ipfs/bitswap /meshsub/1.1.0 /meshsub/1.0.0 /floodsub/1.0.0 /x/ /asmb/maons/1.0.0

lidel commented 1 year ago

Potentially a controversial take, but this feels like UX problem, and not a problem with default limits.

Printing error every 10 seconds when any limit is hit is bit hardcore. We have limits for a reason. They are feature. This constant ERROR messaging makes them feel like "error that needs to be solved by raising/removing limits", which is imo UX antipattern.

:point_right: ResourceMgr protecting user and working as expected should not look like ERROR.

Quick ideas that would remove the need for relaxing default limits:

Potentially a controversial take: maybe there is an UX solution to this?

Printing error every 10 seconds when any limit is hit is bit hardcore. We have limits for a reason. They are feature. This constant ERROR messaging makes them feel like "error that needs to be solved by raising/removing limits", which is imo UX antipattern.

:point_right: ResourceMgr protecting user and working as expected should not look like ERROR.

ajnavarro commented 1 year ago

@lidel if we bump errors to be printed every hour, you won't notice when you really are hitting limits. Let's say that you hit limits at 1:01 PM then until 2:00 PM you won't have any kind of information, and the limits at that time are just fine, but you are having a warning or an error because you hit limits one hour away.

We can make it possible to silence the error output, but the problem won't disappear. Nodes with better hardware will struggle to handle incoming connections even if they have enough real resources, and small nodes will be like they don't have any resource manager active because default limits will be too high.

My proposal is to set default resource manager limits per peer only by default. It is the only limit that we can set by default knowing that will be right for any user and any hardware: https://github.com/ipfs/kubo/pull/9443 The ideal solution will be to limit to only one connection per IP but it is something that is not straightforward.

lidel commented 1 year ago

Ack on reporting this early – fair enough. For now, proposed cosmetic message adjustment in https://github.com/ipfs/kubo/pull/9444 but we still suggest user should raise limit – and it is the only way for log spam to go away.

I am not sold on per IP/peerid limits being enough. Adversary could:

Having a default global limit for incoming connections is really useful.

Unsure if there is a silver bullet, but UX solution feels way less risky I would add Swarm.ResourceMgr.VerboseEnforcement flag as a way to move messages out of ERROR log level, and keep global limit, just to be safe, but maybe there is a better way?

kallisti5 commented 1 year ago

@lidel any idea though on why i'm seeing "random" limits when ResourceMgr is {} on upgrades? That feels like a bug. When ResourceMgr is {}, the values seemingly are set randomly. Thinking it's an upgrade issue.

This bug can be reproduced by:

BigLep commented 1 year ago

Ok, there's lots to unpack here...

General note to the community

  1. Thanks for reporting the issues, and apologies here for the snags this is causing.
  2. While we get this figured out and improved, please know that you can certainly use Kubo 0.16 (which doesn't enable the libp2p resource manager by default) or you should be able to disable the resource manager with https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgrenabled explicitly (although it looks like per user reports that flag may not be working)
  3. This is a top priority for Kubo maintainers to address before the Kubo 0.18 release (RC targeting 2022-12-08).

Problems

Below are the problems I'm seeing...

Reports of a disabled go-libp2p resource manager still managing resources

https://github.com/ipfs/kubo/issues/9432#issuecomment-1327647257 and other comments said they disabled the resource manager but are still seeing messages in the logs. In that commend, we can see it's disabled in config:

  "Swarm": {
    "ResourceMgr": {
      "Enabled": false
    },

Confusion around "magic values"

4611686018427388000 is actually not a magic value. It is effectively "infinity" and is defined here: https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr_defaults.go#L15

Confusion on when Swarm.ResourceMgr is set to {}

I believe in this case we are setting default values as described in https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr.

Clarity around the "error message" meaning

There is confusion about what messages like mean "system: cannot reserve inbound connection: resource limit exceeded". For this example, it means Swarm.ResourceMgr.Limits.System.ConnsInbound is exceeded. It would be nice if the value from ipfs swarm limit system was included.

Actionable advice when resource limits are hit

When a resource limit is hit, we point users to https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr. It's clear from the feedback here that the docs there aren't actionable enough.

Idea of limiting to one connection per IP

I don't think we should discuss this more or pursue it. As discussed in https://github.com/ipfs/kubo/issues/9432#issuecomment-1334482153, it is ineffective and impacts NATs (especially large organizations/enterprises which have all their traffic coming from behind a NAT).

When is ResourceMgr a feature (protecting users as expected) vs. a bug

There is good commentary on this in https://github.com/ipfs/kubo/issues/9432#issuecomment-1334160936.

I agree with this sentiment in general. go-libp2p bounding the resources it uses is generally a feature, and the presence of a message doesn't necessarily mean there's a bug.

That said, if by default our limits are crazy low, then I would call it a bug. For example, if Swarm.ResourceMgr.Limits.System.ConnsInbound was set to "1" by default, I would consider it a bug because this would mean we'd only allow 1 inbound connection.

Using https://github.com/ipfs/kubo/issues/9432#issuecomment-1331177613 as an example, Swarm.ResourceMgr.Limits.System.ConnsInbound is set to 123. This is derived from Swarm.ResourceMgr.MaxMemory. I assume @kallisti5 didn't set a MaxMemroy value and the default of TOTAL_SYSTEM_MEMORY]/8 was used per https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgrmaxmemory. (In this case TOTAL_SYSTEM_MEMORY looks to be around ~16GB as 1999292928*8/(1024*1024) = ~15,253)

General notes for maintainers

  1. All of the problems/issues above need to have one or more actions. We can't rely on typed responses in this issue. We ultimately need to make fixes and/or respond with URLs to documentation.
  2. I have thoughts/ideas on potential actions but didn't want to slow down getting this message out with my limited window this evening and would love others to take the reigns here so I don't become the blocker. I'm happy to engage/help if it's useful.
  3. Lets make sure we have a place where we're tracking the problems to solve and actions we're going to take. I created https://github.com/ipfs/kubo/issues/9442 where we can do this, but I'm fine if it happens somewhere else.
ajnavarro commented 1 year ago

For all people who are having resource manager errors, could you try executing the following command? ipfs config Swarm.ResourceMgr.MaxMemory "HALF_TOTAL_MEMORY" where HALF_TOTAL_MEMORY is a string value like 16GB.

Complete command:

ipfs config Swarm.ResourceMgr.MaxMemory "16GB"

The default value right now is 1/8 of the total memory. Setting this value to 1/2 of the entire node memory will increase the number of inbound connections allowed and hopefully reduce resource manager log errors.

Note that after executing the command, you need to restart the node to take effect.

BigLep commented 1 year ago

Friendly ping for anyone affected here to please try increasing Swarm.ResourceMgr.MaxMemory as directed https://github.com/ipfs/kubo/issues/9432#issuecomment-1337133633. We'd like to get signal on how much this alleviates problems given we're intending to including fixes in the 0.18 RC for 2022-12-08.

Derrick- commented 1 year ago

@BigLep: Use case on this for consideration. I run multiple very large (some 2TB+) ipfs servers on VM's and sometimes change the memory allocated to them.

Fixing the memory to half of the current memory is something that I'm reluctant to do because I may have a server using 16GB, but due to host hardware constraints from time to time I may need to reduce it to 8GB, or expand it.

I would desire that service running on the system self adjust for this change and utilize the available memory

Setting the Connection Limits based on system memory is a great idea. However in my case, these are dedicated ipfs servers so the 1/8 of system memory default is too small. On the other hand 1/2 of system memory is too big for the ipfs server I'm running on my development workstation.

A literal macro type setting such as, HALF_TOTAL_MEMORY, EIGHTH_TOTAL_MEMORY seems like would be optimal for my use cases rather than fixing these limits in the config file.

init profiles for Server could apply HALF_TOTAL_MEMORY, and the default could remain at eighth.

Derrick- commented 1 year ago
ipfs config --json Swarm.ResourceMgr '{}'
ipfs config Swarm.ResourceMgr.MaxMemory "8GB"

Results in:

$ ipfs swarm limit system
{
  "Conns": 4611686018427388000,
  "ConnsInbound": 540,
  "ConnsOutbound": 4611686018427388000,
  "FD": 32768,
  "Memory": 15999586304,
  "Streams": 4611686018427388000,
  "StreamsInbound": 8653,
  "StreamsOutbound": 4611686018427388000
}

I still seeing resourcemanager errors.

$ ipfs swarm stats system
{
  "System": {
    "Conns": 768,
    "ConnsInbound": 537,
    "ConnsOutbound": 231,
    "FD": 589,
    "Memory": 64024608,
    "Streams": 567,
    "StreamsInbound": 250,
    "StreamsOutbound": 317
  }
}

I also got this message during startup, which I think is new: ipfs[2738]: 2022/12/06 10:16:44 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.

System memory is 16GB

BigLep commented 1 year ago

Thanks for sharing @Derrick- .

"Memory": 15999586304,

This seems odd given ipfs config Swarm.ResourceMgr.MaxMemory "8GB". It looks like 16GB was actually being passed.

I still seeing resourcemanager errors.

It would be helpful to know the error message so can confirm, but it looks like it would be for System.ConnsInbound (which is a value scaled based on Swarm.ResourceMgr.MaxMemory.

I also got this message during startup, which I think is new

I personally don't know what this is, but it's likely separate and won't be triaged here.

We're tracking the various followups we're active on here: https://github.com/ipfs/kubo/issues/9442 . One thing we expect will help is to lower your ConnMgr limits below System.ConnsInbound. There is some discussion about this in https://github.com/ipfs/kubo/pull/9468 (search for "How does the resource manager (ResourceMgr) relate to the connection manager (ConnMgr)?")

Derrick- commented 1 year ago

@BigLep Here's updated stats after a few hours of runtime. I don't understand the Memory limit result, it does seem to claim 16GB.

$ ipfs swarm stats system
{
  "System": {
    "Conns": 776,
    "ConnsInbound": 447,
    "ConnsOutbound": 329,
    "FD": 507,
    "Memory": 167321696,
    "Streams": 1194,
    "StreamsInbound": 594,
    "StreamsOutbound": 600
  }
}
$ ipfs swarm limit system
{
  "Conns": 4611686018427388000,
  "ConnsInbound": 540,
  "ConnsOutbound": 4611686018427388000,
  "FD": 32768,
  "Memory": 15999586304,
  "Streams": 4611686018427388000,
  "StreamsInbound": 8653,
  "StreamsOutbound": 4611686018427388000
}

And to confirm, here's the excerpt from my config file under "Swarm":

    "ResourceMgr": {
      "MaxMemory": "8GB"
    }

The service was restarted after the config file change this morning, and the ConnsInbound limit did increase from a previously hardcoded 500, and even previously default level of < 200 after ResourceMgr was cleared and only MaxMemory was set.

And here are the console errors from journal:

Dec 06 21:19:39 video2 ipfs[2738]: 2022-12-06T21:19:39.854-0500        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 15 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Dec 06 21:19:39 video2 ipfs[2738]: 2022-12-06T21:19:39.854-0500        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
Dec 06 21:19:49 video2 ipfs[2738]: 2022-12-06T21:19:49.868-0500        ERROR        resourcemanager        libp2p/rcmgr_logging.go:53        Resource limits were exceeded 12 times with error "system: cannot reserve inbound connection: resource limit exceeded".
Dec 06 21:19:49 video2 ipfs[2738]: 2022-12-06T21:19:49.868-0500        ERROR        resourcemanager        libp2p/rcmgr_logging.go:57        Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr

I'm not surprised to get more connections to this server than I can handle, there's a lot of good stuff here, I agree with previous sentiments that these errors are too much noise though, also since they are level of ERROR.

rotarur commented 1 year ago

yeah this error is noisy and my system limits are never reached as well:

# ipfs swarm stats system; ipfs swarm limit system
{
  "System": {
    "Conns": 632,
    "ConnsInbound": 2,
    "ConnsOutbound": 630,
    "FD": 183,
    "Memory": 71340064,
    "Streams": 1082,
    "StreamsInbound": 77,
    "StreamsOutbound": 1005
  }
}
{
  "Conns": 1024,
  "ConnsInbound": 2048,
  "ConnsOutbound": 1024,
  "FD": 4512,
  "Memory": 1073741824,
  "Streams": 16384,
  "StreamsInbound": 4096,
  "StreamsOutbound": 16384
}
2color commented 1 year ago

Sharing some more information. Should I enable the resource manager again as suggested in https://github.com/ipfs/kubo/issues/9432#issuecomment-1337133633?

Recent logs:

4:43:35.743Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBbgvmGKKwUtr7dSvaZTo4kZc3tSkgYGGjKC96Mbft4Jn","error":"Application error 0x0: conn-4801317: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:35.744Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBbgvmGKKwUtr7dSvaZTo4kZc3tSkgYGGjKC96Mbft4Jn: Application error 0x0: conn-4801317: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:35.783Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBbgvmGKKwUtr7dSvaZTo4kZc3tSkgYGGjKC96Mbft4Jn","error":"Application error 0x0: conn-4801318: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:35.783Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBbgvmGKKwUtr7dSvaZTo4kZc3tSkgYGGjKC96Mbft4Jn: Application error 0x0: conn-4801318: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:35.818Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBbgvmGKKwUtr7dSvaZTo4kZc3tSkgYGGjKC96Mbft4Jn","error":"Application error 0x0: conn-4801319: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:35.818Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBbgvmGKKwUtr7dSvaZTo4kZc3tSkgYGGjKC96Mbft4Jn: Application error 0x0: conn-4801319: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:35.857Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWGikAcdxMVbZetD5E1GJgWEtTM9oVwjfQKXkuQzRk9Xuo","error":"Application error 0x0: conn-393542: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:35.857Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWGikAcdxMVbZetD5E1GJgWEtTM9oVwjfQKXkuQzRk9Xuo: Application error 0x0: conn-393542: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:35.936Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWPvhLFyxpgMGMKQ2TUQm4JKdnb7QLb16C3z7xPyauZ1tm","error":"Application error 0x0: conn-8013505: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:35.937Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWPvhLFyxpgMGMKQ2TUQm4JKdnb7QLb16C3z7xPyauZ1tm: Application error 0x0: conn-8013505: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:35.981Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWLTgLVtTXANfnsibLb4tRQ3dVGn8WQp1rDgxBqyG8JczL","error":"Application error 0x0: conn-2481306: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:35Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:35.983Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWLTgLVtTXANfnsibLb4tRQ3dVGn8WQp1rDgxBqyG8JczL: Application error 0x0: conn-2481306: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:36.019Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBAf6guqxSuGRdJoCSBfXhXhz1LfgqBgJnyzJkRZa3MAs","error":"Application error 0x0: conn-737991: system: cannot reserve connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:36.019Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBAf6guqxSuGRdJoCSBfXhXhz1LfgqBgJnyzJkRZa3MAs: Application error 0x0: conn-737991: system: cannot reserve connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:36.219Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBAf6guqxSuGRdJoCSBfXhXhz1LfgqBgJnyzJkRZa3MAs","error":"Application error 0x0: conn-738007: system: cannot reserve connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:36.220Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBAf6guqxSuGRdJoCSBfXhXhz1LfgqBgJnyzJkRZa3MAs: Application error 0x0: conn-738007: system: cannot reserve connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:36.248Z","logger":"canonical-log","caller":"swarm/swarm_dial.go:487","msg":"CANONICAL_PEER_STATUS: peer=12D3KooWLTgLVtTXANfnsibLb4tRQ3dVGn8WQp1rDgxBqyG8JczL addr=/ip4/188.166.184.94/udp/4001/quic sample_rate=100 connection_status=\"established\" dir=\"outbound\""}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:36.419Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBAf6guqxSuGRdJoCSBfXhXhz1LfgqBgJnyzJkRZa3MAs","error":"Application error 0x0: conn-738018: system: cannot reserve connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:36.420Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBAf6guqxSuGRdJoCSBfXhXhz1LfgqBgJnyzJkRZa3MAs: Application error 0x0: conn-738018: system: cannot reserve connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"info","ts":"2022-12-07T14:43:36.505Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWLTgLVtTXANfnsibLb4tRQ3dVGn8WQp1rDgxBqyG8JczL","error":"Application error 0x0: conn-2481310: system: cannot reserve inbound connection: resource limit exceeded"}
2022-12-07T14:43:36Z app[7b585d20] fra [info]{"level":"warn","ts":"2022-12-07T14:43:36.505Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWLTgLVtTXANfnsibLb4tRQ3dVGn8WQp1rDgxBqyG8JczL: Application error 0x0: conn-2481310: system: cannot reserve inbound connection: resource limit exceeded"}
/ # ipfs version
ipfs version 0.17.0

/ # ipfs  config show
{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": [
      "/ip4/0.0.0.0/tcp/5001",
      "/ip6/::/tcp/5001"
    ],
    "Announce": [],
    "AppendAnnounce": [
      "/ip4/168.220.93.39/tcp/4001",
      "/ip4/168.220.93.39/tcp/4002/ws",
      "/dns4/my-ipfs-node.fly.dev/tcp/443/wss"
    ],
    "Gateway": "/ip4/0.0.0.0/tcp/8080",
    "NoAnnounce": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip4/0.0.0.0/tcp/4002/ws",
      "/ip4/0.0.0.0/udp/4003/quic/webtransport",
      "/ip6/::/tcp/4001",
      "/ip6/::/tcp/4002/ws",
      "/ip6/::/udp/4003/quic/webtransport",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": false
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "12D3KooWAp58z5DeiQSVUXdeqgyLjvkcxgph9Pn2xZ9D1yWzHPCV"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Methods": null,
    "Routers": null,
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "EnableHolePunching": true,
    "RelayClient": {
      "Enabled": true
    },
    "RelayService": {},
    "ResourceMgr": {
      "Enabled": false
    },
    "Transports": {
      "Multiplexers": {},
      "Network": {
        "WebTransport": true,
        "Websocket": true
      },
      "Security": {}
    }
  }
}

/ # ipfs swarm stats system
{
  "System": {
    "Conns": 0,
    "ConnsInbound": 0,
    "ConnsOutbound": 0,
    "FD": 0,
    "Memory": 0,
    "Streams": 0,
    "StreamsInbound": 0,
    "StreamsOutbound": 0
  }
}

/ # ipfs swarm limit system
Error: missing ResourceMgr: make sure the daemon is running with Swarm.ResourceMgr.Enabled

/ # ipfs  swarm peers | wc -l
460
ajnavarro commented 1 year ago

@2color Yes. These errors are from other peers you are connected to. You should enable ResourceManager to avoid DoS attacks. Here are all the pending issues on RM:

This one is the specific one for your case:

BigLep commented 1 year ago

@Derrick-

I don't understand the Memory limit result, it does seem to claim 16GB.

Doh - that's a bug. It's being fixed here: https://github.com/ipfs/kubo/pull/9470

@rotarur

yeah this error is noisy and my system limits are never reached as well:

ACK on this "error" being noisy. This is being tracked in https://github.com/ipfs/kubo/issues/9442 and handled in https://github.com/ipfs/kubo/pull/9472

That said, in your output you only shared the "system" scope. There are other scopes that could be exceeded. Ideally you would be able to do to ipfs swarm stats --min-used-limit-perc=90 all to pinpoint this, but there is a bug (doh!): https://github.com/ipfs/kubo/issues/9473

In the interim, you can inspect the log message more to understand what scope and what limit within it is being breached. We have some docs drafted in https://github.com/ipfs/kubo/pull/9468 to help with this. Search for "What do these "Protected from exceeding resource limits" log messages mean?"

@2color Yeah, this is understandably confusing. To help alleviate until things are clarified on the go-libp2p side, we have docs in https://github.com/ipfs/kubo/pull/9468. Search for "What are the "Application error ... cannot reserve ..." messages?". Given the challenges we've been having with resource manager enablement in Kubo, we'd certainly welcome your feedback on that PR.

rotarur commented 1 year ago

Protected from exceeding resource limits" log messages mean?

thanks @BigLep ❤️ Your documentation helped me to understand where this error comes from in this line

This can be confusing, but these `Application error ... cannot reserve ...` messages can occur even if your local node has the resoure manager disabled.

So my errors are from remote nodes and not mine. How this error can be addressed?

n00b21337 commented 1 year ago

I accidentally changed the resource in the wrong server, so changing the inbound connection value does work.

So changing the inbound connections to 1024 cured this for me. So add to your .ipfs/config in the "Swarm" block the following then you can tweak as you need.

 "ResourceMgr": {
      "Limits": {
        "System": {
          "Memory": 1073741824,
          "FD": 512,
          "Conns": 1024,
          "ConnsInbound": 1024,
          "ConnsOutbound": 1024,
          "Streams": 16384,
          "StreamsInbound": 4096,
          "StreamsOutbound": 16384
        }
      }
    },

add it without last comma as it will make errors with config file

 "ResourceMgr": {
      "Limits": {
        "System": {
          "Memory": 1073741824,
          "FD": 512,
          "Conns": 1024,
          "ConnsInbound": 1024,
          "ConnsOutbound": 1024,
          "Streams": 16384,
          "StreamsInbound": 4096,
          "StreamsOutbound": 16384
        }
      }
    }
BigLep commented 1 year ago

I'm going to resolve this since we aren't getting new reports here and we have the fixes mostly handled (and at least being tracked in https://github.com/ipfs/kubo/issues/9442 ). If after 0.18 we get new issues, we'll coalesce and create additional resulting issues.

ShadowJonathan commented 1 year ago

I'm still seeing this after updating to 0.18;

Details ``` Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.337+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWPbxXhf4yM5JQ9oZTj9Zf7wMTVUZ4kkYbPMxkoL7ms2Wf: Application error 0x0 (remote): conn-1175419: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.374+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWD9sbAXVkDvnXSmcUc7PLuT7Q9ivGWMFets6MrM6ZKU5X: Application error 0x0 (remote): conn-6584243: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.417+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWLDpfChYqxzAmH4BmiQPMgSSvWimiWtVveRehcWSuUhgL: Application error 0x0 (remote): conn-4939303: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.479+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWLDpfChYqxzAmH4BmiQPMgSSvWimiWtVveRehcWSuUhgL: Application error 0x0 (remote): conn-4939305: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.541+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWLDpfChYqxzAmH4BmiQPMgSSvWimiWtVveRehcWSuUhgL: Application error 0x0 (remote): conn-4939307: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.568+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWD9sbAXVkDvnXSmcUc7PLuT7Q9ivGWMFets6MrM6ZKU5X: Application error 0x0 (remote): conn-6584244: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.569+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWNymRgoF9s6ZAbAsgYzq2t6yBHL7YjRSwMuAKe6944Pnd: Application error 0x0 (remote): conn-842736: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.646+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWPbxXhf4yM5JQ9oZTj9Zf7wMTVUZ4kkYbPMxkoL7ms2Wf: Application error 0x0 (remote): conn-1175427: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.670+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWPMD7CyVVp8L4cSFAdLMW8WVxH7wPVP3MGVjGFpKECi3Y: EOF Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.745+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWD9sbAXVkDvnXSmcUc7PLuT7Q9ivGWMFets6MrM6ZKU5X: Application error 0x0 (remote): conn-6584249: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.840+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAH7Tr6jpLt611z55ThVeJcVLkSc3aUQpymsaUt9zFP2K: Application error 0x0 (remote): conn-2596802: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:45 infinistore ipfs[9783]: 2023-01-23T20:40:45.999+0100 WARN net/identify identify/id.go:334 failed to identify QmcMb4SQVF6jJ85NJJqWwC2zZTHExnGLoB1uKC4ckLQqW6: Application error 0x0 (remote) Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.004+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWRYuXAL8f8xznRhgSazKqJnV7gmAdtHs21diHinFBndK1: Application error 0x0 (remote): conn-4290944: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.315+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWRYuXAL8f8xznRhgSazKqJnV7gmAdtHs21diHinFBndK1: Application error 0x0 (remote): conn-4290968: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.317+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAH7Tr6jpLt611z55ThVeJcVLkSc3aUQpymsaUt9zFP2K: Application error 0x0 (remote): conn-2596806: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.406+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAwexecyDwKP9RCxJPVNSw2rKqyprbnq7AwbU34U3pZhb: Application error 0x0 (remote): conn-3799167: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.457+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooW9rth8L5K7beeci2dvA3u6d8cNgFJPSZoT6Wnzx7zYevh: Application error 0x0 (remote): conn-4191568: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.484+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWJfwFEszcUQL1qDjL3Ddnwbj1KLhn6W8GkbS8JNtLFnhg: Application error 0x0 (remote): conn-3659526: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.547+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAwexecyDwKP9RCxJPVNSw2rKqyprbnq7AwbU34U3pZhb: Application error 0x0 (remote): conn-3799171: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.551+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWJfwFEszcUQL1qDjL3Ddnwbj1KLhn6W8GkbS8JNtLFnhg: Application error 0x0 (remote): conn-3659529: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.557+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooW9rth8L5K7beeci2dvA3u6d8cNgFJPSZoT6Wnzx7zYevh: Application error 0x0 (remote): conn-4191569: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.618+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWJfwFEszcUQL1qDjL3Ddnwbj1KLhn6W8GkbS8JNtLFnhg: Application error 0x0 (remote): conn-3659531: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.657+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooW9rth8L5K7beeci2dvA3u6d8cNgFJPSZoT6Wnzx7zYevh: Application error 0x0 (remote): conn-4191572: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.668+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWRYuXAL8f8xznRhgSazKqJnV7gmAdtHs21diHinFBndK1: Application error 0x0 (remote): conn-4290998: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.686+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAwexecyDwKP9RCxJPVNSw2rKqyprbnq7AwbU34U3pZhb: Application error 0x0 (remote): conn-3799172: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.763+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKMRP2WRHPucDrjKg2Ny3unafuHZVAsDocrSnMJ9zFCeY: Application error 0x0 (remote): conn-3848343: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.770+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAH7Tr6jpLt611z55ThVeJcVLkSc3aUQpymsaUt9zFP2K: Application error 0x0 (remote): conn-2596817: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.784+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWNnT6E8WwYRD7Wm6LoqB4J2fGAebi7EKut5wLGBqJTiNx: Application error 0x0 (remote): conn-4172357: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.788+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWQhcu9YASxJBUTF3YsFWvDScPkMnEL1DUmWbraNFbTmtx: Application error 0x0 (remote): conn-472789: system: cannot reserve connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.831+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWGyZn2UodB5nq7AekLUt3GmVXtte5sXtpUVkVpjmyfpxF: Application error 0x0 (remote): conn-886816: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.928+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWC7sr8PAYmeEo1PnE2EL2W7fnEaNPZZYvFxV251QPSauG: Application error 0x0 (remote): conn-12230883: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.949+0100 WARN net/identify identify/id.go:334 failed to identify QmSfyoXrD75Qsmb5y8e731BSXoDQRhCp2xrYk4YaeSoCsn: stream reset Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.953+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWSDTNXZQzRjBfW5X3GVpgxunnDbuTV8ejGMYywMrog3W7: Application error 0x0 (remote): conn-2397312: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.954+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWL2bxobj2yrS9akq3LA3oTqdG6LKTJrJUTbXA3WVWMmek: Application error 0x0 (remote): conn-3552494: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.958+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWQhcu9YASxJBUTF3YsFWvDScPkMnEL1DUmWbraNFbTmtx: Application error 0x0 (remote): conn-472794: system: cannot reserve connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.960+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWRnsmwaXVZCFq63A2rP8BsPoW72Brk5djjLx1rrUHHjAj: Application error 0x0 (remote): conn-3986472: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.966+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWM45dXHAgN5cnyUA7FEnEP7NqfiEuryccLNRBroYMhPMb: Application error 0x0 (remote): conn-4633129: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:46 infinistore ipfs[9783]: 2023-01-23T20:40:46.975+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWGyZn2UodB5nq7AekLUt3GmVXtte5sXtpUVkVpjmyfpxF: Application error 0x0 (remote): conn-886817: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.073+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWSDTNXZQzRjBfW5X3GVpgxunnDbuTV8ejGMYywMrog3W7: Application error 0x0 (remote): conn-2397314: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.093+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWNnT6E8WwYRD7Wm6LoqB4J2fGAebi7EKut5wLGBqJTiNx: Application error 0x0 (remote): conn-4172375: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.102+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWC7sr8PAYmeEo1PnE2EL2W7fnEaNPZZYvFxV251QPSauG: Application error 0x0 (remote): conn-12230884: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.104+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKMRP2WRHPucDrjKg2Ny3unafuHZVAsDocrSnMJ9zFCeY: Application error 0x0 (remote): conn-3848352: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.120+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWGyZn2UodB5nq7AekLUt3GmVXtte5sXtpUVkVpjmyfpxF: Application error 0x0 (remote): conn-886819: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.127+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKDa3As3Pdyc37Jdtm7Nd1Yc3nK7Mf4j6yyU3zaPyWMVK: Application error 0x0 (remote): conn-863090: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.139+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWQhcu9YASxJBUTF3YsFWvDScPkMnEL1DUmWbraNFbTmtx: Application error 0x0 (remote): conn-472804: system: cannot reserve connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.151+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWRnsmwaXVZCFq63A2rP8BsPoW72Brk5djjLx1rrUHHjAj: Application error 0x0 (remote): conn-3986489: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.193+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWSDTNXZQzRjBfW5X3GVpgxunnDbuTV8ejGMYywMrog3W7: Application error 0x0 (remote): conn-2397317: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.216+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWCpf6oidCA2Rm4sHmGdW19yMMGJhBgtHDtvfPuuXcNPC4: Application error 0x0 (remote): conn-1580670: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.242+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWL2bxobj2yrS9akq3LA3oTqdG6LKTJrJUTbXA3WVWMmek: Application error 0x0 (remote): conn-3552501: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.272+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWC7sr8PAYmeEo1PnE2EL2W7fnEaNPZZYvFxV251QPSauG: Application error 0x0 (remote): conn-12230887: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.278+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWCpf6oidCA2Rm4sHmGdW19yMMGJhBgtHDtvfPuuXcNPC4: Application error 0x0 (remote): conn-1580671: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.341+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWCpf6oidCA2Rm4sHmGdW19yMMGJhBgtHDtvfPuuXcNPC4: Application error 0x0 (remote): conn-1580672: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.341+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWRnsmwaXVZCFq63A2rP8BsPoW72Brk5djjLx1rrUHHjAj: Application error 0x0 (remote): conn-3986494: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.345+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWM45dXHAgN5cnyUA7FEnEP7NqfiEuryccLNRBroYMhPMb: Application error 0x0 (remote): conn-4633132: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.403+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWNnT6E8WwYRD7Wm6LoqB4J2fGAebi7EKut5wLGBqJTiNx: Application error 0x0 (remote): conn-4172386: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.420+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWBYFEdCivUPxGjVXZyK2XNQK3XyAQJLRzXec27CjGi5g1: Application error 0x0 (remote): conn-540183: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.434+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWLGef9KMwgz1RNf3wdREH6YwgR9y7tXt4oApGtLQSDtdD: Application error 0x0 (remote): conn-2956154: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.442+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKs8JKg9g9yZ7KVvEiukhJ1g7Dp9eVRAcqfDFwRtdUPvc: Application error 0x0 (remote): conn-3249545: system: cannot reserve connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.443+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKMRP2WRHPucDrjKg2Ny3unafuHZVAsDocrSnMJ9zFCeY: Application error 0x0 (remote): conn-3848357: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.456+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKDa3As3Pdyc37Jdtm7Nd1Yc3nK7Mf4j6yyU3zaPyWMVK: Application error 0x0 (remote): conn-863093: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.496+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAu9ZdmMThYTb5zM7yFS7CC2f3h6Ke618ZCPeotLDq83t: Application error 0x0 (remote): conn-3168884: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.529+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWL2bxobj2yrS9akq3LA3oTqdG6LKTJrJUTbXA3WVWMmek: Application error 0x0 (remote): conn-3552504: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.538+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWPhcrTjMrodw58AbjL1WTEvZf4DNVFF7BaK2WLmECr3mA: Application error 0x0 (remote): conn-4537665: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.602+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWByi5s2UntYLz6jV8fJaRq2jXd7KGQNgjjUWo6R9tz2BA: Application error 0x0 (remote): conn-2387092: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.607+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWGndGTgYheJv4aWj2m1xjixeYp9s9pMJdjoxoU8BXoSj1: Application error 0x0 (remote): conn-1104761: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.616+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKTZQcDwZrzTqZZjf9whNLUU74wpnj9PKdpQptPKuQTEt: Application error 0x0 (remote): conn-2805574: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.660+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWGo7dXTVyDi1FfAmzpyq2tZu7H7bPaD5NMf6GEX4ae5kB: Application error 0x0 (remote): conn-18136598: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.675+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWLGef9KMwgz1RNf3wdREH6YwgR9y7tXt4oApGtLQSDtdD: Application error 0x0 (remote): conn-2956158: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.700+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWGo7dXTVyDi1FfAmzpyq2tZu7H7bPaD5NMf6GEX4ae5kB: Application error 0x0 (remote): conn-18136599: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.725+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWM45dXHAgN5cnyUA7FEnEP7NqfiEuryccLNRBroYMhPMb: Application error 0x0 (remote): conn-4633134: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.744+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWKs8JKg9g9yZ7KVvEiukhJ1g7Dp9eVRAcqfDFwRtdUPvc: Application error 0x0 (remote): conn-3249568: system: cannot reserve connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.772+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWAu9ZdmMThYTb5zM7yFS7CC2f3h6Ke618ZCPeotLDq83t: Application error 0x0 (remote): conn-3168888: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.775+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWBYFEdCivUPxGjVXZyK2XNQK3XyAQJLRzXec27CjGi5g1: Application error 0x0 (remote): conn-540187: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.809+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWCvQpMzz2UGmioK7TfqCf8wpwbhQWkpqFQR5c7ba4wnP4: Application error 0x0 (remote): conn-5525804: system: cannot reserve inbound connection: resource limit exceeded Jan 23 20:40:47 infinistore ipfs[9783]: 2023-01-23T20:40:47.829+0100 WARN net/identify identify/id.go:334 failed to identify 12D3KooWGndGTgYheJv4aWj2m1xjixeYp9s9pMJdjoxoU8BXoSj1: Application error 0x0 (remote): conn-1104767: system: cannot reserve inbound connection: resource limit exceeded ```

My setup is a publicly-routable node, with a system inboundconn limit of 5000, which this does not hit at all.

None of my limits are getting even close to being hit, and i'm also seeing the above immediately after the daemon has finished booting;

Jan 23 20:40:02 infinistore ipfs[9783]: API server listening on /ip4/127.0.0.1/tcp/5001
Jan 23 20:40:02 infinistore ipfs[9783]: WebUI: http://127.0.0.1:5001/webui
Jan 23 20:40:02 infinistore ipfs[9783]: 2023-01-23T20:40:02.637+0100        WARN        net/identify        identify/id.go:334        failed to identify 12D3KooWMzGDXiDayMjvYNqRcgpyixCYKrtT71ehoVEDt1VBQpwk: stream reset
Jan 23 20:40:02 infinistore ipfs[9783]: 2023-01-23T20:40:02.996+0100        WARN        net/identify        identify/id.go:334        failed to identify QmdwQTkGHb6ewS4A9XYtcWkuC9GvFGKBiPJ2EyrLeNAqWb: stream reset
Jan 23 20:40:03 infinistore ipfs[9783]: 2023-01-23T20:40:03.162+0100        WARN        net/identify        identify/id.go:334        failed to identify QmNyLtNKnLXDkLssibaKdZriVMjbsGajZfTL34pt23AzGL: stream reset
Jan 23 20:40:03 infinistore ipfs[9783]: 2023-01-23T20:40:03.202+0100        WARN        net/identify        identify/id.go:334        failed to identify Qmbut9Ywz9YEDrz8ySBSgWyJk41Uvm2QJPhwDJzJyGFsD6: Application error 0x0 (remote): conn-5019347: system: cannot reserve inbound connection: resource limit exceeded
Jan 23 20:40:03 infinistore ipfs[9783]: 2023-01-23T20:40:03.294+0100        WARN        net/identify        identify/id.go:334        failed to identify 12D3KooWKapSEuNYwxZVnWs9uJEQSmFprwuQwzfoByx1KUJeD1XA: Application error 0x0 (remote): conn-2480692: system: cannot reserve inbound connection: resource limit exceeded
Jan 23 20:40:03 infinistore ipfs[9783]: 2023-01-23T20:40:03.303+0100        WARN        net/identify        identify/id.go:334        failed to identify 12D3KooWG7HY6VLRQCoipwuhBNSB7mx4tHmHubLH6v1uRpTSgbnX: Application error 0x0 (remote): conn-1684852: system: cannot reserve inbound connection: resource limit exceeded
Jan 23 20:40:03 infinistore ipfs[9783]: 2023-01-23T20:40:03.338+0100        WARN        net/identify        identify/id.go:334        failed to identify 12D3KooWBN47Kk6J5CFGBLxNXm1jL8MEZitYLzco9be3pAfizwTp: Application error 0x0 (remote): conn-5283888: system: cannot reserve inbound connection: resource limit exceeded
Jan 23 20:40:03 infinistore ipfs[9783]: 2023-01-23T20:40:03.372+0100        WARN        net/identify        identify/id.go:334        failed to identify 12D3KooWMMWmcwP6DDfwHmFp1QYZH2GcFN3SGiAz7wyts9MTcFsZ: Application error 0x0 (remote): conn-4626232: system: cannot reserve connection: resource limit exceeded
Jan 23 20:40:03 infinistore ipfs[9783]: 2023-01-23T20:40:03.384+0100        WARN        net/identify        identify/id.go:334        failed to identify QmRHbKAb6HuVWGWjs3SiSCAQ2m87TLBdxwsFvnNxo45BDb: stream reset

I have my log-level set to warn, and i haven't spotted any "we have suppressed 123 limit messages" or something

ShadowJonathan commented 1 year ago

I'm downgrading to 0.17 again, after https://github.com/ipfs-cluster/ipfs-cluster/issues/1835 makes it not possible to use it for me.

sven-hash commented 1 year ago

I had the same problem and it was coming from that all the ports aren't well opened in ufw

ShadowJonathan commented 1 year ago

I'm running on Debian 10, @sven-hash could you elaborate on that? What do you mean "not properly opened", so you mean that only some ports were, or that some ports had some specific filtering?

sven-hash commented 1 year ago

I'm running on Debian 10, @sven-hash could you elaborate on that? What do you mean "not properly opened", so you mean that only some ports were, or that some ports had some specific filtering?

The port 4001 in UDP wasn't open in the security group on VPS cloud provider side so it was not accessible and after sometime the cluster crashed. Since I open all the port correctly in UDP and TCP I never have a crash

ShadowJonathan commented 1 year ago

I just checked, it doesn't seem my VPS provider (hetzner) has any firewall rules or such in place

ShadowJonathan commented 1 year ago

I have since then changed the log level to error, and after 9 hours, i'm observing my pinning process going well, so for me the matter is relatively solved.

ShadowJonathan commented 1 year ago

FWIW yesterday i used to occasionally ipfs ping my own node, and observing the following;

jonathan@os-mule:~$ ipfs ping 12D3xxx
PING 12D3xxx.
Ping error: stream reset
Ping error: stream reset
Ping error: stream-12855: resource scope closed
Ping error: stream-12855: resource scope closed
Ping error: stream-12855: resource scope closed
Ping error: stream-12855: resource scope closed
Ping error: stream-12855: resource scope closed
Ping error: stream-12855: resource scope closed
Ping error: stream-12855: resource scope closed
Ping error: stream-12855: resource scope closed
Error: ping failed

This was while the system stats were way under limits, so I don't know why the other side would close this stream early. This also happened simultaneously with a staggered pinning process, so I don't know what happened there.

Pinging now seems to go alright. Inbound conns is about 800-1000, while it reached a high of 3000 yesterday.

I wonder if due to the contention of restarting IPFS a bunch of times, the linux network buffer is overloaded. I did not get any kernel messages or the likes, but for a while, reaching the node was very unreliable.

2color commented 1 year ago

After seeing this problem with 0.18-RC1, I upgraded to 0.18 today, and reproduced the same problem as reported by @ShadowJonathan and by me earlier on two different machines.


More information

Machine 1 (running on fly.io): 2GB Ram, default 0.18 config

starting logs ``` 2023-01-24T13:56:19Z runner[d9e0298d] fra [info]Starting instance 2023-01-24T13:56:19Z runner[d9e0298d] fra [info]Configuring virtual machine 2023-01-24T13:56:20Z runner[d9e0298d] fra [info]Pulling container image 2023-01-24T13:56:21Z runner[d9e0298d] fra [info]Unpacking image 2023-01-24T13:56:22Z runner[d9e0298d] fra [info]Preparing kernel init 2023-01-24T13:56:22Z runner[d9e0298d] fra [info]Setting up volume 'ipfs_data' 2023-01-24T13:56:22Z runner[d9e0298d] fra [info]Opening encrypted volume 2023-01-24T13:56:22Z runner[d9e0298d] fra [info]Configuring firecracker 2023-01-24T13:56:22Z runner[d9e0298d] fra [info]Starting virtual machine 2023-01-24T13:56:23Z app[d9e0298d] fra [info]Starting init (commit: b8364bb)... 2023-01-24T13:56:23Z app[d9e0298d] fra [info]Mounting /dev/vdc at /data/ipfs w/ uid: 0, gid: 0 and chmod 0755 2023-01-24T13:56:23Z app[d9e0298d] fra [info]Preparing to run: `/sbin/tini -- /usr/local/bin/start_ipfs daemon --migrate=true --agent-version-suffix=docker` as root 2023-01-24T13:56:23Z app[d9e0298d] fra [info]2023/01/24 13:56:23 listening on [fdaa:0:20b9:a7b:b6:4:48a9:2]:22 (DNS: [fdaa::3]:53) 2023-01-24T13:56:23Z app[d9e0298d] fra [info][WARN tini (528)] Tini is not running as PID 1 and isn't registered as a child subreaper. 2023-01-24T13:56:23Z app[d9e0298d] fra [info]Zombie processes will not be re-parented to Tini, so zombie reaping won't work. 2023-01-24T13:56:23Z app[d9e0298d] fra [info]To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1. 2023-01-24T13:56:23Z app[d9e0298d] fra [info]Changing user to ipfs 2023-01-24T13:56:23Z app[d9e0298d] fra [info]ipfs version 0.18.0 2023-01-24T13:56:23Z app[d9e0298d] fra [info]Found IPFS fs-repo at /data/ipfs 2023-01-24T13:56:23Z app[d9e0298d] fra [info]Executing '/container-init.d/ipfs-config.sh'... 2023-01-24T13:56:24Z app[d9e0298d] fra [info]Initializing daemon... 2023-01-24T13:56:24Z app[d9e0298d] fra [info]Kubo version: 0.18.0-6750377 2023-01-24T13:56:24Z app[d9e0298d] fra [info]Repo version: 13 2023-01-24T13:56:24Z app[d9e0298d] fra [info]System version: amd64/linux 2023-01-24T13:56:24Z app[d9e0298d] fra [info]Golang version: go1.19.1 2023-01-24T13:56:24Z app[d9e0298d] fra [info]Computing default go-libp2p Resource Manager limits based on: 2023-01-24T13:56:24Z app[d9e0298d] fra [info] - 'Swarm.ResourceMgr.MaxMemory': "1.8 GB" 2023-01-24T13:56:24Z app[d9e0298d] fra [info] - 'Swarm.ResourceMgr.MaxFileDescriptors': 5120 2023-01-24T13:56:24Z app[d9e0298d] fra [info]Applying any user-supplied overrides on top. 2023-01-24T13:56:24Z app[d9e0298d] fra [info]Run 'ipfs swarm limit all' to see the resulting limits. 2023-01-24T13:56:24Z app[d9e0298d] fra [info]2023/01/24 13:56:24 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details. 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"warn","ts":"2023-01-24T13:56:24.252Z","logger":"swarm2","caller":"swarm/swarm_listen.go:29","msg":"listening failed","on":"/ip4/0.0.0.0/udp/4001/quic/webtransport","error":"cannot listen on non-WebTransport addr: /ip4/0.0.0.0/udp/4001/quic/webtransport"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"warn","ts":"2023-01-24T13:56:24.252Z","logger":"swarm2","caller":"swarm/swarm_listen.go:29","msg":"listening failed","on":"/ip6/::/udp/4001/quic/webtransport","error":"cannot listen on non-WebTransport addr: /ip6/::/udp/4001/quic/webtransport"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:24.253Z","logger":"p2pnode","caller":"libp2p/addrs.go:132","msg":"Swarm listening at: [/p2p-circuit /ip4/127.0.0.1/tcp/4001 /ip4/172.19.64.242/tcp/4001 /ip4/172.19.64.243/tcp/4001 /ip4/127.0.0.1/tcp/4002/ws /ip4/172.19.64.242/tcp/4002/ws /ip4/172.19.64.243/tcp/4002/ws /ip6/::1/tcp/4001 /ip6/fdaa:0:20b9:a7b:b6:4:48a9:2/tcp/4001 /ip6/2604:1380:4091:360c:0:4:48a9:3/tcp/4001 /ip6/::1/tcp/4002/ws /ip6/fdaa:0:20b9:a7b:b6:4:48a9:2/tcp/4002/ws /ip6/2604:1380:4091:360c:0:4:48a9:3/tcp/4002/ws /ip4/127.0.0.1/udp/4001/quic /ip4/172.19.64.242/udp/4001/quic /ip4/172.19.64.243/udp/4001/quic /ip6/::1/udp/4001/quic /ip6/fdaa:0:20b9:a7b:b6:4:48a9:2/udp/4001/quic /ip6/2604:1380:4091:360c:0:4:48a9:3/udp/4001/quic]"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:24.259Z","logger":"peering","caller":"peering/peering.go:190","msg":"starting"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:24.260Z","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:279","msg":"starting refreshing cpl 0 with key CIQAAACOYYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (routing table size was 0)"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"warn","ts":"2023-01-24T13:56:24.260Z","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:136","msg":"failed when refreshing routing table2 errors occurred:\n\t* failed to query for self, err=failed to find any peer in table\n\t* failed to refresh cpl=0, err=failed to find any peer in table\n\n"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:24.303Z","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:279","msg":"starting refreshing cpl 0 with key CIQAAAFV4UAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (routing table size was 0)"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"warn","ts":"2023-01-24T13:56:24.303Z","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:199","msg":"failed when refreshing routing table","error":"2 errors occurred:\n\t* failed to query for self, err=failed to find any peer in table\n\t* failed to refresh cpl=0, err=failed to find any peer in table\n\n"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"warn","ts":"2023-01-24T13:56:24.304Z","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:199","msg":"failed when refreshing routing table","error":"2 errors occurred:\n\t* failed to query for self, err=failed to find any 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"warn","ts":"2023-01-24T13:56:24.304Z","logger":"dht/RtRefreshManager","caller":"rtrefresh/rt_refresh_manager.go:199","msg":"failed when refreshing routing table","error":"2 errors occurred:\n\t* failed to query for self, err=failed to find any peer in table\n\t* failed to refresh cpl=0, err=failed to find any peer in table\n\n"} 2023-01-24T13:56:24Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:24.655Z","logger":"bootstrap","caller":"bootstrap/bootstrap.go:178","msg":"bootstrapped with QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"} 2023-01-24T13:56:25Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:25.507Z","logger":"bootstrap","caller":"bootstrap/bootstrap.go:178","msg":"bootstrapped with QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa"} 2023-01-24T13:56:25Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:25.629Z","logger":"bootstrap","caller":"bootstrap/bootstrap.go:178","msg":"bootstrapped with QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN"} 2023-01-24T13:56:25Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:25.638Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt","error":"stream reset"} 2023-01-24T13:56:25Z app[d9e0298d] fra [info]{"level":"warn","ts":"2023-01-24T13:56:25.638Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt: stream reset"} 2023-01-24T13:56:25Z app[d9e0298d] fra [info]{"level":"info","ts":"2023-01-24T13:56:25.638Z","logger":"bootstrap","caller":"bootstrap/bootstrap.go:178","msg":"bootstrapped with QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt"} ```
Error Logs ``` 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.115Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBHMaminKyW7fnnXycxjCK1kMCGEE6qJybbCC8PLh9pmo","error":"Application error 0x0 (remote): conn-1331080: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.116Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBHMaminKyW7fnnXycxjCK1kMCGEE6qJybbCC8PLh9pmo: Application error 0x0 (remote): conn-1331080: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.139Z","logger":"canonical-log","caller":"swarm/swarm_dial.go:497","msg":"CANONICAL_PEER_STATUS: peer=12D3KooWFJRDAtqVtfruEz7og3Eum44WQndgPjv8pTDmqgA2eXJ9 addr=/ip4/18.118.37.174/udp/30002/quic sample_rate=100 connection_status=\"established\" dir=\"outbound\""} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.142Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWAtATqyEpyVG6gyJbyGs5K3M6ULY2wabjMCaAGQt92LRn","error":"Application error 0x0 (remote): conn-1143536: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.142Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWAtATqyEpyVG6gyJbyGs5K3M6ULY2wabjMCaAGQt92LRn: Application error 0x0 (remote): conn-1143536: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.329Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBa2s1W5KAhQ8izfZfjxJS8N86FdafQMdK5PP41JMiuM8","error":"Application error 0x0 (remote): conn-1846746: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.330Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBa2s1W5KAhQ8izfZfjxJS8N86FdafQMdK5PP41JMiuM8: Application error 0x0 (remote): conn-1846746: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.391Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWP6uoDtjs9NupN6RFq65KteG3PbgNky4NtuF1mSAGULrt","error":"Application error 0x0 (remote): conn-1540612: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.391Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWP6uoDtjs9NupN6RFq65KteG3PbgNky4NtuF1mSAGULrt: Application error 0x0 (remote): conn-1540612: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.591Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWDybPucQ6Ri1s5VBdoWBgmPpdhjiNLNH8MXAKDPAdD4aQ","error":"Application error 0x0 (remote): conn-588035: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.591Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWDybPucQ6Ri1s5VBdoWBgmPpdhjiNLNH8MXAKDPAdD4aQ: Application error 0x0 (remote): conn-588035: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.628Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWBa2s1W5KAhQ8izfZfjxJS8N86FdafQMdK5PP41JMiuM8","error":"Application error 0x0 (remote): conn-1846750: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.628Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWBa2s1W5KAhQ8izfZfjxJS8N86FdafQMdK5PP41JMiuM8: Application error 0x0 (remote): conn-1846750: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.660Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWCE4gsSoZQkjqLQ4SBa6ZLJTgKmY6uXLQJTRo5ifG9DYh","error":"Application error 0x0 (remote): conn-2079838: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.661Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWCE4gsSoZQkjqLQ4SBa6ZLJTgKmY6uXLQJTRo5ifG9DYh: Application error 0x0 (remote): conn-2079838: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:42:59.753Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWP6uoDtjs9NupN6RFq65KteG3PbgNky4NtuF1mSAGULrt","error":"Application error 0x0 (remote): conn-1540616: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:42:59Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:42:59.754Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWP6uoDtjs9NupN6RFq65KteG3PbgNky4NtuF1mSAGULrt: Application error 0x0 (remote): conn-1540616: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:43:00Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:43:00.130Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWCE4gsSoZQkjqLQ4SBa6ZLJTgKmY6uXLQJTRo5ifG9DYh","error":"Application error 0x0 (remote): conn-2079843: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:43:00Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:43:00.130Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWCE4gsSoZQkjqLQ4SBa6ZLJTgKmY6uXLQJTRo5ifG9DYh: Application error 0x0 (remote): conn-2079843: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:43:00Z app[ee584281] fra [info]{"level":"info","ts":"2023-01-24T13:43:00.157Z","logger":"net/identify","caller":"identify/id.go:369","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWDybPucQ6Ri1s5VBdoWBgmPpdhjiNLNH8MXAKDPAdD4aQ","error":"Application error 0x0 (remote): conn-588039: system: cannot reserve inbound connection: resource limit exceeded"} 2023-01-24T13:43:00Z app[ee584281] fra [info]{"level":"warn","ts":"2023-01-24T13:43:00.158Z","logger":"net/identify","caller":"identify/id.go:334","msg":"failed to identify 12D3KooWDybPucQ6Ri1s5VBdoWBgmPpdhjiNLNH8MXAKDPAdD4aQ: Application error 0x0 (remote): conn-588039: system: cannot reserve inbound connection: resource limit exceeded"} ```
Kubo `ipfs config show` ``` / # ipfs config show { "API": { "HTTPHeaders": {} }, "Addresses": { "API": [ "/ip4/0.0.0.0/tcp/5001", "/ip6/::/tcp/5001" ], "Announce": [], "AppendAnnounce": [ "/ip4/149.248.221.175/tcp/4001", "/ip4/149.248.221.175/tcp/4002/ws", "/dns4/my-ipfs-node.fly.dev/tcp/443/wss" ], "Gateway": "/ip4/0.0.0.0/tcp/8080", "NoAnnounce": [ "/ip4/10.0.0.0/ipcidr/8", "/ip4/100.64.0.0/ipcidr/10", "/ip4/169.254.0.0/ipcidr/16", "/ip4/172.16.0.0/ipcidr/12", "/ip4/192.0.0.0/ipcidr/24", "/ip4/192.0.2.0/ipcidr/24", "/ip4/192.168.0.0/ipcidr/16", "/ip4/198.18.0.0/ipcidr/15", "/ip4/198.51.100.0/ipcidr/24", "/ip4/203.0.113.0/ipcidr/24", "/ip4/240.0.0.0/ipcidr/4", "/ip6/100::/ipcidr/64", "/ip6/2001:2::/ipcidr/48", "/ip6/2001:db8::/ipcidr/32", "/ip6/fc00::/ipcidr/7", "/ip6/fe80::/ipcidr/10" ], "Swarm": [ "/ip4/0.0.0.0/tcp/4001", "/ip4/0.0.0.0/tcp/4002/ws", "/ip4/0.0.0.0/udp/4001/quic/webtransport", "/ip6/::/tcp/4001", "/ip6/::/tcp/4002/ws", "/ip6/::/udp/4001/quic/webtransport", "/ip4/0.0.0.0/udp/4001/quic", "/ip6/::/udp/4001/quic" ] }, "AutoNAT": {}, "Bootstrap": [ "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN", "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa", "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb", "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt", "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ", "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ" ], "DNS": { "Resolvers": {} }, "Datastore": { "BloomFilterSize": 0, "GCPeriod": "1h", "HashOnRead": false, "Spec": { "mounts": [ { "child": { "path": "blocks", "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2", "sync": true, "type": "flatfs" }, "mountpoint": "/blocks", "prefix": "flatfs.datastore", "type": "measure" }, { "child": { "compression": "none", "path": "datastore", "type": "levelds" }, "mountpoint": "/", "prefix": "leveldb.datastore", "type": "measure" } ], "type": "mount" }, "StorageGCWatermark": 90, "StorageMax": "10GB" }, "Discovery": { "MDNS": { "Enabled": false } }, "Experimental": { "AcceleratedDHTClient": false, "FilestoreEnabled": false, "GraphsyncEnabled": false, "Libp2pStreamMounting": false, "P2pHttpProxy": false, "StrategicProviding": false, "UrlstoreEnabled": false }, "Gateway": { "APICommands": [], "HTTPHeaders": { "Access-Control-Allow-Headers": [ "X-Requested-With", "Range", "User-Agent" ], "Access-Control-Allow-Methods": [ "GET" ], "Access-Control-Allow-Origin": [ "*" ] }, "NoDNSLink": false, "NoFetch": false, "PathPrefixes": [], "PublicGateways": null, "RootRedirect": "", "Writable": false }, "Identity": { "PeerID": "12D3KooW9snnuzHgfzpBKWtZxU9tpPDqB7SG4qM9tGLA9eQgYpQh" }, "Internal": {}, "Ipns": { "RecordLifetime": "", "RepublishPeriod": "", "ResolveCacheSize": 128 }, "Migration": { "DownloadSources": [], "Keep": "" }, "Mounts": { "FuseAllowOther": false, "IPFS": "/ipfs", "IPNS": "/ipns" }, "Peering": { "Peers": null }, "Pinning": { "RemoteServices": {} }, "Plugins": { "Plugins": null }, "Provider": { "Strategy": "" }, "Pubsub": { "DisableSigning": false, "Router": "" }, "Reprovider": {}, "Routing": { "Methods": null, "Routers": null }, "Swarm": { "AddrFilters": [ "/ip4/10.0.0.0/ipcidr/8", "/ip4/100.64.0.0/ipcidr/10", "/ip4/169.254.0.0/ipcidr/16", "/ip4/172.16.0.0/ipcidr/12", "/ip4/192.0.0.0/ipcidr/24", "/ip4/192.0.2.0/ipcidr/24", "/ip4/192.168.0.0/ipcidr/16", "/ip4/198.18.0.0/ipcidr/15", "/ip4/198.51.100.0/ipcidr/24", "/ip4/203.0.113.0/ipcidr/24", "/ip4/240.0.0.0/ipcidr/4", "/ip6/100::/ipcidr/64", "/ip6/2001:2::/ipcidr/48", "/ip6/2001:db8::/ipcidr/32", "/ip6/fc00::/ipcidr/7", "/ip6/fe80::/ipcidr/10" ], "ConnMgr": {}, "DisableBandwidthMetrics": false, "DisableNatPortMap": true, "RelayClient": {}, "RelayService": {}, "ResourceMgr": { "MaxMemory": "1.8 GB" }, "Transports": { "Multiplexers": {}, "Network": {}, "Security": {} } } } ```

Kubo `--min-used-limit-perc=90` ``` / # ipfs swarm stats --min-used-limit-perc=90 all {} ```
Kubo `ipfs swarm limit all` ``` / # ipfs swarm limit all { REDACTED "QmfW6L87V3wLyaxPt9LZBg1HDKmAJGbEEMNudER2o7Y58v": { "Conns": 1, "ConnsInbound": 0, "ConnsOutbound": 1, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 } }, "Protocols": { "/ipfs/bitswap/1.2.0": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 8, "StreamsInbound": 8, "StreamsOutbound": 0 }, "/ipfs/id/1.0.0": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 }, "/ipfs/kad/1.0.0": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 68, "StreamsInbound": 0, "StreamsOutbound": 68 }, "/ipfs/lan/kad/1.0.0": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 1, "StreamsInbound": 1, "StreamsOutbound": 0 }, "/ipfs/ping/1.0.0": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 }, "/libp2p/circuit/relay/0.1.0": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 } }, "Services": { "libp2p.autonat": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 }, "libp2p.identify": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 }, "libp2p.ping": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 } }, "System": { "Conns": 1965, "ConnsInbound": 0, "ConnsOutbound": 1965, "FD": 27, "Memory": 4718592, "Streams": 77, "StreamsInbound": 9, "StreamsOutbound": 68 }, "Transient": { "Conns": 0, "ConnsInbound": 0, "ConnsOutbound": 0, "FD": 0, "Memory": 0, "Streams": 0, "StreamsInbound": 0, "StreamsOutbound": 0 } } ```

ajnavarro commented 1 year ago

@2color @ShadowJonathan Please have a look into Theme 3: improve RM errors coming from other peers section here: https://github.com/ipfs/kubo/issues/9442

TLDR: These errors are errors coming from remote peers that are hitting Resource Manager limits, not the local node. Note the (remote): flag on errors before the connection manager error.

ShadowJonathan commented 1 year ago

Ah, I had that suspicion, but I wasn't entirely sure if this was indeed the case. Thanks for pointing to the (remote) marker, that makes those log entries a lot more understandable, as it was US-daylight hours, with the nodes most likely more overloaded at that time 😅

Then that addresses all my concerns for me personally, except the stall on ping during those hours, that is still a mystery to me.

The log entries could be made more clear if (remote) is turned into (from remote), else it's easy to confuse this as being related to remote something, not that it is coming from the remote node. I'll echo this in the issue you linked as well.