ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.19k stars 3.02k forks source link

Prevent multiple instances of "ipfs bitswap reprovide" running at the same time #10513

Open LeDechaine opened 2 months ago

LeDechaine commented 2 months ago

Checklist

Installation method

ipfs-update or dist.ipfs.tech

Version

Kubo version: 0.29.0
Repo version: 15
System version: amd64/linux
Golang version: go1.22.4

Config

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic-v1",
      "/ip4/0.0.0.0/udp/4001/quic-v1/webtransport",
      "/ip6/::/udp/4001/quic-v1",
      "/ip6/::/udp/4001/quic-v1/webtransport"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic-v1/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": false
    }
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "Libp2pStreamMounting": false,
    "OptimisticProvide": true,
    "OptimisticProvideJobsPoolSize": 120,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "DeserializedResponses": null,
    "DisableHTMLErrors": null,
    "ExposeRoutingAPI": null,
    "HTTPHeaders": {},
    "NoDNSLink": true,
    "NoFetch": false,
    "PublicGateways": {
      "k51qzi5uqu5dj4zil10lqlbtckpmozoxghycqhtksngn215toulwb3n8k9sv2k": {
        "NoDNSLink": false,
        "Paths": []
      }
    },
    "RootRedirect": ""
  },
  "Identity": {
    "PeerID": "12D3KooWA6HLX9ebnT91TktUzRNx3WJta6Ks1FZrVetyU7AY9Rjf"
  },
  "Import": {
    "CidVersion": null,
    "HashFunction": null,
    "UnixFSChunker": null,
    "UnixFSRawLeaves": null
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128,
    "UsePubsub": true
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {},
  "Routing": {
    "Methods": null,
    "Routers": null
  },
  "Swarm": {
    "AddrFilters": [
      "/ip4/10.0.0.0/ipcidr/8",
      "/ip4/100.64.0.0/ipcidr/10",
      "/ip4/169.254.0.0/ipcidr/16",
      "/ip4/172.16.0.0/ipcidr/12",
      "/ip4/192.0.0.0/ipcidr/24",
      "/ip4/192.0.2.0/ipcidr/24",
      "/ip4/192.168.0.0/ipcidr/16",
      "/ip4/198.18.0.0/ipcidr/15",
      "/ip4/198.51.100.0/ipcidr/24",
      "/ip4/203.0.113.0/ipcidr/24",
      "/ip4/240.0.0.0/ipcidr/4",
      "/ip6/100::/ipcidr/64",
      "/ip6/2001:2::/ipcidr/48",
      "/ip6/2001:db8::/ipcidr/32",
      "/ip6/fc00::/ipcidr/7",
      "/ip6/fe80::/ipcidr/10"
    ],
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

A more accurate title for this is possibly: Prevent multiple instances of "ipfs bitswap reprovide" running at the same time.

This is similar to this, but with "Reprovider.Strategy" not set.

"ipfs bitswap reprovide" apparently "triggers reprovider to announce our data to network" and is a recommended way to make IPFS work better on forums. Having only a 20mb website (about 100 files) on IPFS, I set a cron job on two different servers, to do "ipfs bitswap reprovide" every hour on two different VPS's. Doing the command manually appeared to just hang, no info whatsoever (which is probably a bug already), and I had to "ctrl+C" out of it, but I added it to crontab anyway. TL,DR: Don't.

Here's "journalctl -u ipfs" on server 1 (So yeah I found out my new VPS actually meets the minimum IPFS requirements, this is a quad-core with 8Gb of ram -- "ipfs config show" is from this one)

Sep 12 01:50:42 server systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL Sep 12 01:50:42 server systemd[1]: ipfs.service: Failed with result 'signal'. Sep 12 01:50:42 server systemd[1]: ipfs.service: Consumed 1h 9min 48.690s CPU time. Sep 12 01:50:42 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 12. (...) Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time. Sep 12 11:18:06 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 16. Sep 12 11:18:06 server systemd[1]: Stopped ipfs.service - IPFS daemon. Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time.

"journalctl -u ipfs" on server 2 (this is from a single-core with 512mb ram -- Hosting one website and running with "NoFetch" now)

Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 1. Sep 09 00:09:18 server2 systemd[1]: Stopped IPFS daemon. Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Consumed 9min 10.547s CPU time. Sep 09 00:09:18 server2 systemd[1]: Started IPFS daemon. Sep 09 02:45:53 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 3. (...) Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Failed with result 'signal'. Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Consumed 16min 24.878s CPU time. Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 5.

But no restarts since 4 days ago on server2?

"ps aux | grep ipfs" on server2:

ledecha+ 16088 2.5 12.8 2294940 56084 ? Ssl Sep11 40:41 ipfs daemon --migrate=true --enable-gc --routing=dhtclient ledecha+ 16194 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide ledecha+ 16195 0.0 1.6 1659356 7276 ? Sl Sep11 0:29 ipfs bitswap reprovide ledecha+ 16479 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide ledecha+ 16480 0.0 0.0 1733088 0 ? Sl Sep11 0:31 ipfs bitswap reprovide ledecha+ 16730 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide ledecha+ 16731 0.0 0.0 1659356 4 ? Sl Sep11 0:24 ipfs bitswap reprovide

...gave me 26 instances of "ipfs bitswap reprovide" running

Long story short: executing "ipfs bitswap reprovide", even for 20mb (about 200 files), is too much, and will systematically crash your ipfs daemon even with a quad core with 8Gb of ram. Big server or not, this is definitely not the intended result(s). IPFS worked fine, stable, no crashes, for multiple months, without "ipfs bitswap reprovide" as a cron job, even on the VPS with 1-core and 512mb ram.

Maybe I was an idiot for setting it to "ipfs bitswap reprovide" every hour, maybe that's why ipfs crashed. If that's the case, at minimum, I recommend preventing reproviding when another "reprovide" job is already ongoing.

lidel commented 2 months ago

The default Reprovider.Interval is once every 22 hours. Modern Amino DHT servers remember records for 48h (https://github.com/libp2p/go-libp2p-kad-dht/pull/793), old ones remembered for 24h. There should be no reason to provide more often than once a day.

Forcing reprovide every hour via cron is def. not doing you any good, especially if providing your CIDs takes longer than tha this shortened interval. You should just disable it and rely on Reprovider.Interval.

A more accurate title for this is possibly: Prevent multiple instances of "ipfs bitswap reprovide" running at the same time.

We don't have a global mutex on running ipfs bitswap reprovide (it is backed by Provider.Reprovide(req.Context) from boxo which is always forced).

This is a sensible bug to fix as part of the reprovider work we plan to do (cc @gammazero)