iotaledger / iri

IOTA Reference Implementation
Other
1.15k stars 370 forks source link

Increased RPC response times & CPU usage correlates to datadir disk usage increase over time #1692

Open c0deright opened 4 years ago

c0deright commented 4 years ago

Bug description

We use an iri daemon solely to query it via RPC with commands getNodeInfo & findTransactions.

IRI version

docker image v1.8.2-RELEASE (sha256:2ab9f91228576a3307f6cc30b77cda8f964f07dc60850d31ed389feb4bca30a6)

Hardware Spec

AWS ec2 instance type t3.large (2 cores, 8GB of RAM)

Additional info

Besides the docker engine there is nothing(1) running on this machine, just one container running iotaledger/iri. (1) snmpd, nrpe, sshd and such low level idle daemons are running, but no other processes besides this.

Steps To Reproduce

  1. install docker
  2. download and extract snapshot from https://db.iota.partners/iri-mainnet-snapshot.tar.gz in datadir
  3. run iotaledger/iri:v1.8.2-RELEASE with ~12 connected peers and use iri.ini:
    [IRI]
    PORT = 14265
    NEIGHBORING_SOCKET_PORT = 15600
    MAX_FIND_TRANSACTIONS = 500
    DEBUG = false
    LOCAL_SNAPSHOTS_PRUNING_ENABLED = true
    MAX_NEIGHBORS = 20
    NEIGHBORS = ...
  4. execute curl -sH 'X-IOTA-API-Version: 1' http://hostname:14265 -d '{"command": "getNodeInfo"}' every 20 seconds

Expected behaviour

Actual behaviour

Tried

Workaround

  1. stop iri daemon
  2. delete everything in datadir
  3. download and extract snapshot from https://db.iota.partners/iri-mainnet-snapshot.tar.gz in datadir
  4. start iri

CPU usage over last 3 months

cpu_iri

CPU usage over last 12 hours (Workaround applied at 09:00)

cpu_12h_iri

GalRogozinski commented 4 years ago

Question to you: Were you running it with LOCAL_SNASPHOTS_PRUNING_ENABLED on?

c0deright commented 4 years ago

@GalRogozinski My bad, with my latest edit I somehow reverted to an old revision. I re-added the info above. Yes, LOCAL_SNAPSHOTS_PRUNING_ENABLED = true is set.

c0deright commented 4 years ago

Not only CPU usage dropped after deleting datadir and extracting the snapshot, Disk IO went down, too.

last 3 days

metrics_iri

last 3 months

metrics_over_time

GalRogozinski commented 4 years ago

We suspect that the Disk I/O and cpu are related. After we will be done with #935 (WIP) we will test how it affects cpu as well

c0deright commented 4 years ago

Can we - as a workaround - disable local snapshots?

I honestly have no idea what these local snapshots are used for but you suggest that pruning them might be the issue. So if we don't create them in the first place and this has no negative impact on getNodeInfo & findTransactions RPC commands we could disable taking local snapshots.

Edit: Ok, just read Concepts: Local snapshots and from the looks of it disabling local snapshots is not an alternative.