Open RubenKelevra opened 4 years ago
Okay, I found the reason:
Netdata is polling the object count, the repo size, and the peers from the IPFS node via the API. IPFS doesn't seem to cache the values and update them when they change (write-through-cache-strategy).
Since Netdata is polling metrics quite often, this is causing the issue. As a temporary workaround, the plugin for IPFS can be configured to use a larger data collection frequency...
So this turns into an improvement request, that polling the API for those metrics doesn't cause large CPU loads.
The repo size is memoized, the number of objects is not. Try polling ipfs repo stat --size-only
.
pooling for peers count by requesting a full list of peers is not making sense also
const peerInfos = await ipfs.swarm.peers({ timeout: 2500 })
return peerInfos.length
Version information:
Description:
I'm running ipfs on a new server with an SSD storage. I'm writing a lot of individual files with
ipfs add --chunker 'buzhash' --cid-version 1 --hash 'blake2b-256'
to the node, copy them to the right location in the MFS and unpin them again (since ipfs files write doesn't support setting a non-standard chunker).Afterwards, the MFS-folder-CID is pinned on ipfs-cluster, which runs on the same node.
ipfs-cluster shows that all cluster-pins are locally pinned, which are part of the pinset.
Another remote server has also all pins of the cluster set pinned, two other servers still catch up - so they are receiving blocks from the local node.
The low bandwidth use, while it should send a somewhat large folder to two other nodes brought a possible issue to my attention - the outgoing network speed was shown as around 4 MBit/s which is extremely slow for a server basically doing nothing else.
The CPU usage (around 200%) is extremely high for the network usage, so I thought it might still publish CIDs, and went to sleep.
System specs: 4 dedicated cores for the VM from an AMD EPYC 7702P 64-Core Processor; 16 GB of memory.
There are no background tasks running, just ipfs and ipfs-cluster. ipfs-cluster uses like no CPU resources at all.
I tried changing the dht type to
dhtclient
, but this resulted in no change. Restarting the service also resulted in no change, the CPU usage just jumps up again to around 200%.The debug data (I forgot to collect the last ones) - and the binary since it's built from the master. When I read the cpu-profile right, it leads to lot of CPU-time being used by go-ds-badger and go-ipfs-blockstore and functions called by them (flame graph). The debug data was collected some minutes after a restart of the IPFS-daemon, while the ipfs-cluster-service was turned off.
debug.tar.gz
Here are some performance numbers collected on the system, which basically shows no difference in load, while there's only very low network traffic.
Config
DisableBandwidthMetrics
andDisableNatPortMap
are true,EnableAutoRelay
andEnableRelayHop
are false. I use the server-profile and routing.type isdhtclient
. I use the badgerds,StorageGCWatermark
is 90,StorageMax
is 280GB.I use the systemd-hardening.service file from the repo, but changed the ExecStart to
/usr/bin/ipfs daemon --enable-gc --enable-pubsub-experiment --enable-namesys-pubsub