ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.17k stars 3.02k forks source link

Daemon memory usage grows to 8 GB after 5 hours of file adding and pinning #9437

Closed pio2398 closed 1 year ago

pio2398 commented 1 year ago

Checklist

Installation method

third-party binary

Version

Kubo version: 0.16.0
Repo version: 12
System version: amd64/linux
Golang version: go1.19.3

Config

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/34538",
    "NoAnnounce": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },
  "AutoNAT": {},
 "Bootstrap": [
   "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
   "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
   "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt"
 ],
 "DNS": {
   "Resolvers": {}
 },
 "Datastore": {
   "BloomFilterSize": 0,
   "GCPeriod": "1h",
   "HashOnRead": false,
   "Spec": {
     "mounts": [
       {
         "child": {
           "path": "blocks",
           "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
           "sync": true,
           "type": "flatfs"
         },
         "mountpoint": "/blocks",
         "prefix": "flatfs.datastore",
         "type": "measure"
       },
       {
         "child": {
           "compression": "none",
           "path": "datastore",
           "type": "levelds"
         },
         "mountpoint": "/",
         "prefix": "leveldb.datastore",
         "type": "measure"
       }
     ],
     "type": "mount"
   },
   "StorageGCWatermark": 90,
   "StorageMax": "10GB"
 },
 "Discovery": {
   "MDNS": {
     "Enabled": false
   }
 },
 "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "12D3KooWEHGcBURfi9hMCv8zGyX3UMpKWqGMSfSNSDjho7J4fWqo"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
 "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Methods": null,
    "Routers": null,
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "ConnMgr": {
      "GracePeriod": "20s",
      "HighWater": 900,
      "LowWater": 600,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

My IPFS was unstable so I decide to remove all config and data and start new fresh instance. I started by adding some local content and pining some remote and IPFS was killed by oomd. Next try also ended with usage more than 8 GB of ram.

diag: ipfs/QmU3EWqCxYsMN3EkuuMgeMnSsvPGW55NPfn3i9jU7BAJ93

Jorropo commented 1 year ago

Thx, I have looked at your profile and only 3.8GiB of memory is alive on the heap. I guess you run with the default GOGC value of 200% that means go will only attempt GC when your heap size is 200% bigger than the previous GC run result, however 3.8GiB * 2 = 7.6GiB, and I guess ~400MiB isn't enough to run the rest of your system. In other words, IPFS is only using half of the ram, the other half is dead values that havn't been reclaimed by Go yet.

Go recently introduced https://pkg.go.dev/runtime/debug#SetMemoryLimit you can set it with GOMEMLIMIT=6GiB (6GiB because you have 8GiB of ram, so it should leave ~1GiB of ram free for the rest of the OS) while starting a go program, this will force a GC to happen when you use more than 6GiB, this can be a performance killer if you use more than 6GiB because then you essentially run the GC permanently, but if you use 5.5GiB, it will run the GC more often to compensate (instead of OOMing). It's like dynamically reducing GOGC when you are about to reach the memory limit.

My test to confirm this behaviour was:

package main

import "runtime/debug"
import "runtime"
import "os"

var leak []byte // lots of memory alive to bias GOGC through higher values

func main() {
    // update freeMemory to the memory on your system
    const freeMemory = 40 * 1024 * 1024 * 1024
    const target = freeMemory / 3 * 2 // try to use two third of the system os (because with default 200% GOGC it will oom before GCing)

    const garbage = 1024 * 1024
    leak = make([]byte, target - garbage)
    for i := range leak {
        leak[i] = 1 // memset to force page commit
    }

    debug.SetMemoryLimit(freeMemory) // coment that line to test with normal GOGC behaviour

    os.Stdout.WriteString("initial leak setupped, now generating garbage!\n")

    var keepAlive []byte
    for i := freeMemory/garbage * 3; i != 0; i-- {
        // run this in for a while, try to generate 3 times more garbage than memory we have
        keepAlive = make([]byte, garbage)
        for i := range keepAlive {
            keepAlive[i] = 1 // memset to force page commit
        }
        runtime.Gosched() // simulate some IO, let the GC maybe run
    }
    leak = keepAlive
}

Using debug.SetMemoryLimit did fix OOMs in this synthetic test by running the GC more often (run with GODEBUG=gctrace=1).

For now the mitigation I'll recomend is manually setting GOMEMLIMIT environment variable to slightly less than how much free memory while starting Kubo. In the future hopefully we can configure this automagically with https://github.com/ipfs/kubo/issues/8798/.