ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.01k stars 3k forks source link

Memory leak #10461

Closed RubenKelevra closed 1 month ago

RubenKelevra commented 1 month ago

Checklist

Installation method

built from source

Version

0.29.0

Config

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": null,
    "AppendAnnounce": null,
    "Gateway": "/ip4/127.0.0.1/tcp/8081",
    "NoAnnounce": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.0.0/ipcidr/29",
      "/ip4/192.0.0.8/ipcidr/32",
      "/ip4/192.0.0.170/ipcidr/32",
      "/ip4/192.0.0.171/ipcidr/32",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/443",
      "/ip6/::/tcp/443",
      "/ip4/0.0.0.0/udp/443/quic-v1",
      "/ip4/0.0.0.0/udp/443/quic-v1/webtransport",
      "/ip6/::/udp/443/quic-v1",
      "/ip6/::/udp/443/quic-v1/webtransport"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic-v1/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": null
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "48h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": false,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "500GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": false,
      "Interval": 10
    }
  },
  "Experimental": {
    "FilestoreEnabled": true,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {},
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "x"
  },
  "Internal": {
    "Bitswap": {
      "EngineBlockstoreWorkerCount": 32,
      "EngineTaskWorkerCount": 128,
      "MaxOutstandingBytesPerPeer": null,
      "ProviderSearchDelay": null,
      "TaskWorkerCount": 128
    }
  },
  "Ipns": {
    "RecordLifetime": "4h",
    "RepublishPeriod": "1h",
    "ResolveCacheSize": 2048,
    "UsePubsub": true
  },
  "Migration": {
    "DownloadSources": null,
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {},
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Enabled": true,
    "Router": "gossipsub"
  },
  "Reprovider": {},
  "Routing": {
    "AcceleratedDHTClient": false,
    "Methods": null,
    "Routers": null
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.0.0/ipcidr/29",
      "/ip4/192.0.0.8/ipcidr/32",
      "/ip4/192.0.0.170/ipcidr/32",
      "/ip4/192.0.0.171/ipcidr/32",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "ConnMgr": {
      "GracePeriod": "3m0s",
      "HighWater": 600,
      "LowWater": 500,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": true,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {
      "Limits": {}
    },
    "Transports": {
      "Multiplexers": {},
      "Security": {}
    }
  }
}

Description

ipfs's memory usage increased over the uptime of the server (11 days, 16 hours, 40 minutes) until it reached 69% of my 32 GB memory:

Screenshot_20240723_102728

RubenKelevra commented 1 month ago

Two minutes after restarting the service, ipfs uses 1% of 32 GB memory in my usecase.

Rashkae2 commented 1 month ago

It's really bad. I increased my IPFS VM from 8GB to 12GB Ram, but with AcceleratedDHT on, it can't even make it past 24 hrs.

RubenKelevra commented 1 month ago

Thanks for confirming @Rashkae2

aschmahmann commented 1 month ago

@RubenKelevra can you give a pprof dump ipfs diag profile or at least post the heap from the profile? Wondering if this is https://github.com/libp2p/go-libp2p/issues/2841 (which is fixed and will be in the next release which should have an RC this week).

RubenKelevra commented 1 month ago

@aschmahmann sure, do I need to censor anything in the dump to protect my private key or the private keys of ipns?

mercxry commented 1 month ago

I'm also having the same memory leak on latest version 0.29.0, this is my server memory over the last 15 days

CleanShot 2024-07-30 at 08 46 46@2x

and then after restarting ipfs/kubo CleanShot 2024-07-30 at 08 48 24@2x

RubenKelevra commented 1 month ago

I also got this warning when I shut ipfs down.

128 provides in 22 minutes is an atrocious rate. Wtf?

This server got 2.5 Gbit/s an NVMe and uses 500–600 connections. The number should be a couple of magnitudes higher.

Jul 27 12:16:06 odin.pacman.store ipfs[608]: Daemon is ready
Jul 29 22:36:32 odin.pacman.store ipfs[608]: 2024/07/29 22:36:32 websocket: failed to close network connection: close tcp 45.83.104.156:39592->83.173.236.97:443: use of closed network connection
Jul 30 10:50:56 odin.pacman.store ipfs[608]: Received interrupt signal, shutting down...
Jul 30 10:50:56 odin.pacman.store ipfs[608]: (Hit ctrl-c again to force-shutdown the daemon.)
Jul 30 10:50:56 odin.pacman.store systemd[1]: Stopping InterPlanetary File System (IPFS) daemon...
Jul 30 10:50:58 odin.pacman.store ipfs[608]: 2024-07-30T10:50:58.454+0200        ERROR        core:constructor        node/provider.go:92
Jul 30 10:50:58 odin.pacman.store ipfs[608]: 🔔🔔🔔 YOU ARE FALLING BEHIND DHT REPROVIDES! 🔔🔔🔔
Jul 30 10:50:58 odin.pacman.store ipfs[608]: ⚠ Your system is struggling to keep up with DHT reprovides!
Jul 30 10:50:58 odin.pacman.store ipfs[608]: This means your content could partially or completely inaccessible on the network.
Jul 30 10:50:58 odin.pacman.store ipfs[608]: We observed that you recently provided 128 keys at an average rate of 22m46.337422021s per key.
Jul 30 10:50:58 odin.pacman.store ipfs[608]: 💾 Your total CID count is ~792130 which would total at 300643h34m22.10549473s reprovide process.
Jul 30 10:50:58 odin.pacman.store ipfs[608]: ⏰ The total provide time needs to stay under your reprovide interval (22h0m0s) to prevent falling behind!
Jul 30 10:50:58 odin.pacman.store ipfs[608]: 💡 Consider enabling the Accelerated DHT to enhance your reprovide throughput. See:
Jul 30 10:50:58 odin.pacman.store ipfs[608]: https://github.com/ipfs/kubo/blob/master/docs/config.md#routingaccelerateddhtclient
Jul 30 10:50:59 odin.pacman.store systemd[1]: ipfs@ipfs.service: Deactivated successfully.
Jul 30 10:50:59 odin.pacman.store systemd[1]: Stopped InterPlanetary File System (IPFS) daemon.
Jul 30 10:50:59 odin.pacman.store systemd[1]: ipfs@ipfs.service: Consumed 1d 54min 42.440s CPU time, 13.6G memory peak.
RubenKelevra commented 1 month ago

@aschmahmann sure, do I need to censor anything in the dump to protect my private key or the private keys of ipns?

Hey @aschmahmann, don't bother. I've started ipfs now with a fresh key and no keystore in the server to provide a full dump without any concerns.

But it would be nice to know for the future on how to do this safely, maybe with an howto?

lidel commented 1 month ago

@RubenKelevra good news, privacy notice exists under ipfs diag profile --help and the dumps dont include your private keys:

Privacy Notice:

  The output file includes:

  - A list of running goroutines.
  - A CPU profile.
  - A heap inuse profile.
  - A heap allocation profile.
  - A mutex profile.
  - A block profile.
  - Your copy of go-ipfs.
  - The output of 'ipfs version --all'.

It does not include:

  - Any of your IPFS data or metadata.
  - Your config or private key.
  - Your IP address.
  - The contents of your computer's memory, filesystem, etc.

If you could share profile .zip (here or privately via message to https://discuss.ipfs.tech/u/lidel/), that would be helpful.

FYSA there will be 0.30.0-rc1 next week, which includes some fixes (https://github.com/ipfs/kubo/issues/10436) which might help, or narrow down the number of existing leaks.

RubenKelevra commented 1 month ago

@lidel thanks for the info! Will do ASAP :)

lidel commented 1 month ago

btw: if you want to improve the provide speed without running accelerated dht client, you may also experiment with https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#optimistic-provide

RubenKelevra commented 1 month ago

@lidel wrote:

btw: if you want to improve the provide speed without running accelerated dht client, you may also experiment with https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#optimistic-provid

Thanks, but I think this may be more related to the memory leak issue. 474 sec for a single provide feels a bit too high. ;)

As soon as the issue is gone I'll look into that.

Filelink is out via PM.

gammazero commented 1 month ago

The pprof data you provided indicated that the memory consumption is primarily due to quic connections. There was at least one quic resource issue that has been fixed in a later version of libp2p then you version of kubo is using. That, and the settings in your config my be responsible for this memory use. In you config you have

"GracePeriod": "3m0s",
"HighWater": 600,
"LowWater": 500,

The GracePeriod is set to 3 minutes which is much longer than the default 20 seconds. These settings could cause a large number of connections resulting in memory consumption.

It would be informative to see if using values closer to the defaults would help significantly. Results may also improve with the next version of kubo using a newer go-libp2p that has fixes that may affect this.

RubenKelevra commented 1 month ago

Hey @gammazero,

I've adjusted the default settings, but it seems like they are more suited for a client application, right?

I'm running a server with 2.5 Gbit/s network card, 10 cores, and 32 GB of memory. Its only task is to seed into the IPFS network. Given this setup, the current configuration feels a bit conservative rather than excessive.

Do you know what settings ipfs.io infrastructure uses for their connection manager?

@gammazero wrote:

That, and the settings in your config my be responsible for this memory use. In you config you have

I don't think that's the issue. I've been using these settings for 3 years without any memory problems until now. It seems unlikely that the settings are the cause, especially since the memory usage increases steadily over 18 days, rather than spiking within an hour.

gammazero commented 1 month ago

the current configuration feels a bit conservative

I was thinking that the 3 minute grace period was the setting that may have the most effect.

using these settings for 3 years without any memory problems until now

OK, that is a hint that it may be a libp2p/quic issue. Lets keep this issue open and see what it looks like when we a kubo RC with new libp2p and quic.

RubenKelevra commented 1 month ago

@gammazero the idea behind using 3 minutes was, to not kill useful long term connections due to an influx of single request connections which end up stale afterwards.

Not sure how kubo has improved in the meantime, but I had a lot of "stalls" while downloading from the server in the beginning, if it did other stuff. The switch from 20 seconds to 3 minutes fixed that.

RubenKelevra commented 1 month ago

@gammazero wrote:

The pprof data you provided indicated that the memory consumption is primarily due to quic connections. There was at least one quic resource issue that has been fixed in a later version of libp2p then you version of kubo is using.

@gammazero wrote:

OK, that is a hint that it may be a libp2p/quic issue. Lets keep this issue open and see what it looks like when we a kubo RC with new libp2p and quic.

Just started 749a61b, I guess this should contain the fix, right? I'll report back after a day or two if this issue persists or not.

If it persists I would be happy to run a bisect to find what broke it. :)

RubenKelevra commented 1 month ago

https://github.com/ipfs/kubo/commit/749a61bae21e4229ae5170a0713cc19a4124c4b9 runs for 4 days straight now and still uses just 2% memory. I call this fixed.

Thanks @gammazero @lidel and @aschmahmann!

lidel commented 1 month ago

Great news, thank you for reporting and testing @RubenKelevra :heart: This will ship in Kubo 0.30 (https://github.com/ipfs/kubo/issues/10436)