iotaledger / hornet

HORNET is a powerful IOTA fullnode software
Apache License 2.0
311 stars 151 forks source link

Hornet 2.0.1 -- Kubernetes -- oommkilled #1931

Open MLStoltzenburg opened 1 year ago

MLStoltzenburg commented 1 year ago

Hi, I am migrating the one-click-tangle to hornet 2.0.1, the migration was very well, but the hornet 2.0.1 is unstable in the kubernetes. I needed to increase the memory of my nodes, but the problem is occuring yet.

The nodes had 16Gi each and now 32Gi each.

I needed to set vm.overcommit_ratio=90 and the hornet establed for a while, but after the comsumption of memory was unstable until I get OOM again.

Event error: "Memory cgroup out of memory: Killed process 180904 (hornet) total-vm:12642108kB, anon-rss:9701844kB, file-rss:43452kB, shmem-rss:0kB, UID:65532 pgtables:19416kB oom_score_adj:984"

P.S.: The problem happens after the spammer connects to Hornet 2.0.1

Expected behavior Hornet 2.0.1 stable memory consumption.

Environment information:

Additional context

Hornet Deployment

apiVersion: v1
kind: Service
metadata:
  name: hornet-1
  labels:
    app: hornet-1
spec:
  ports:
    - name: gossip
      port: 15600
    - name: autopeering
      port: 14626
    - name: rest
      port: 14265
    - name: tcp-9311
      port: 9311
    - name: tcp-9029
      port: 9029
    - name: tcp-6060
      port: 6060
  selector:
    app: hornet-1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-hornet-1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hornet-1
  labels:
    source: stolzlabs
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hornet-1
  template:
    metadata:
      labels:
        app: hornet-1
    spec:
      # restartPolicy: OnFailure
      terminationGracePeriodSeconds: 300
      initContainers:
        - name: create-volumes
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          image: busybox:latest
          command:
            - sh
            - -c
          args:
            - >-
              echo "Initializing..." &&
              if [ ! -d "/app/initial" ]; then
                echo "Building GENESIS...";
                mkdir -p /app/snapshots/;
                mkdir -p /app/db/;
                mkdir -p /app/p2pstore/peers;
                mkdir -p /app/initial;
                mkdir -p /app/data;
                mkdir -p /app/data/grafana;
                mkdir -p /app/data/prometheus;
                mkdir -p /app/data/dashboard;
                mkdir -p /app/data/database_legacy;
                mkdir -p /app/data/database_chrysalis;
                mkdir -p /app/data/wasp;
                cd /app/initial/;
                tar zxpvf /genesis/genesis.tgz;
                cd ./db/hornet-1;
                cp -dpR * /app/db/.;
                ls -la /app/db;
                cd /app/initial/;
                cp snapshots/full_snapshot.bin /app/snapshots/.;
                cp /private-keys/identity.key /app/p2pstore/identity.key;
                cp /peering/peering.json /app/.;
                echo "Ok!"
              fi &&
              chown -R 65532:65532 /app &&
              cd /app && 
              chown 65532:65532 .. &&
              ls -la * &&
              echo "End!"
          volumeMounts:
            - mountPath: /app
              name: pvc-hornet-1
            - name: private-key
              mountPath: /private-keys
              readOnly: true
            - name: genesis
              mountPath: /genesis
              readOnly: true
            - name: peering
              mountPath: /peering/peering.json
              readOnly: false
              subPath: peering-hornet-1.json
      containers:
        - name: hornet
          image: iotaledger/hornet:2.0.1
          args: ["-c", "config.json"]
          securityContext:
            runAsUser: 65532
            runAsGroup: 65532
          workingDir: /app
          resources:
            limits:
              memory: "20000Mi"
              cpu: "4"
            requests:
              memory: "4096Mi"
              cpu: "1"
          ports:
            - name: gossip
              protocol: TCP
              containerPort: 15600
            - name: rest
              protocol: TCP
              containerPort: 14265
            - name: autopeering
              protocol: UDP
              containerPort: 14626
            - name: tcp-9311
              protocol: TCP
              containerPort: 9311
            - name: tcp-9029
              protocol: TCP
              containerPort: 9029
            - name: tcp-6060
              protocol: TCP
              containerPort: 6060
          volumeMounts:
            - name: configuration
              mountPath: /app/config.json
              readOnly: true
              subPath: config-hornet-1.json
            - name: pvc-hornet-1
              subPath: p2pstore
              mountPath: /app/p2pstore
            - name: pvc-hornet-1
              subPath: db
              mountPath: /app/db
            - name: pvc-hornet-1
              subPath: snapshots
              mountPath: /app/snapshots
            - name: pvc-hornet-1
              subPath: data
              mountPath: /app/data
            - name: pvc-hornet-1
              subPath: "peering.json"
              mountPath: "/app/peering.json"
      volumes:
        - name: configuration
          configMap:
            name: hornet-config-hornet-1
            items:
              - key: config-hornet-1.json
                path: config-hornet-1.json
        - name: private-key
          secret:
            secretName: hornet-1-key
        - name: genesis
          secret:
            secretName: genesis-hornet-1-secret
        - name: peering
          configMap:
            name: hornet-peering-hornet-1
            items:
              - key: peering-hornet-1.json
                path: peering-hornet-1.json
        - name: pvc-hornet-1
          persistentVolumeClaim:
            claimName: pvc-hornet-1
            readOnly: false

Config.json

{
    "inx": {
        "enabled": true,
        "bindAddress": "0.0.0.0:9029"
    },
    "debug": {
        "enabled": true
    },
    "prometheus": {
        "enabled": true,
        "bindAddress": "0.0.0.0:9311"
    },
    "profiling": {
        "enabled": true,
        "bindAddress": "0.0.0.0:6060"
    },
    "app": {
        "checkForUpdates": true,
        "stopGracePeriod": "5m"
    },
    "db": {
        "path": "db",
        "autoRevalidation": true,
        "checkLedgerStateOnStartup": true
    },
    "snapshots": {
        "fullPath": "snapshots/full_snapshot.bin",
        "deltaPath": "snapshots/delta_snapshot.bin",
        "downloadURLs": []
    },
    "protocol": {
        "targetNetworkName": "stolzlabs",
        "milestonePublicKeyCount": 1,
        "baseToken": {
            "name": "StolzCoin",
            "tickerSymbol": "stolz",
            "unit": "stolz",
            "subunit": "stolzies",
            "decimals": 6,
            "useMetricPrefix": false
        },
        "publicKeyRanges": [
            {
                "key": "b4ec3ab665c4b599d745f9063bf89c0aa3a39e3b5327990a26a82c43c7718cc3",
                "start": 0,
                "end": 0
            }
        ]
    },
    "node": {
        "alias": "hornet-1"
    },
    "p2p": {
        "db": {
            "path": "p2pstore"
        }
    },
    "restAPI": {
        "publicRoutes": [
            "/health",
            "/api/*"
        ],
        "protectedRoutes": [],
        "pow": {
            "enabled": true
        },
        "debugRequestLoggerEnabled": false
    }
}

Spammer

apiVersion: v1
kind: Service
metadata:
  name: spammer
  labels:
    app: spammer
spec:
  ports:
    - name: rest
      port: 14265
    - name: tcp-9311
      port: 9311
    - name: tcp-6060
      port: 6060
  selector:
    app: spammer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spammer
  labels:
    source: stolzlabs
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spammer
  template:
    metadata:
      labels:
        app: spammer
    spec:
      # restartPolicy: OnFailure
      terminationGracePeriodSeconds: 60
      containers:
        - name: hornet
          image: iotaledger/inx-spammer:1.0-rc
          args: ["-c", "config.json"]
          env:
            - name: SPAMMER_MNEMONIC 
              value: "later clinic garbage defense level wrap amused prefer hedgehog dice vapor deer hedgehog symptom reward man motor fan height shine humble amount radar cement"
          workingDir: /app
          resources:
            limits:
              memory: "2048Mi"
              cpu: "2"
            requests:
              memory: "256Mi"
              cpu: "250m"
          ports:
            - name: rest
              protocol: TCP
              containerPort: 14265
            - name: tcp-9311
              protocol: TCP
              containerPort: 9311
            - name: tcp-6060
              protocol: TCP
              containerPort: 6060
          volumeMounts:
            - name: configuration
              mountPath: /app/config.json
              readOnly: true
              subPath: config-spammer.json
      volumes:
        - name: configuration
          configMap:
            name: hornet-config-spammer
            items:
              - key: config-spammer.json
                path: config-spammer.json

Config Spammer

{
    "inx": {
        "address": "XXXXX:9029",
        "targetNetworkName": "stolzlabs"
    },
    "prometheus": {
        "enabled": true,
        "bindAddress": "0.0.0.0:9311",
        "spammerMetrics": true,
        "goMetrics": false,
        "processMetrics": false,
        "promhttpMetrics": false
    },
    "profiling": {
        "enabled": true,
        "bindAddress": "0.0.0.0:6060"
    },
    "spammer": {
        "message": "Spam!",
        "tag": "Stolz Spammer",
        "tagSemiLazy": "Stolz Spammer Semi-Lazy",
        "cpuMaxUsage": 2,
        "bpsRateLimit": "0.1",
        "workers": 0,
        "autostart": true,
        "valueSpam": {
            "enabled": false,
            "sendBasicOutput": true,
            "collectBasicOutput": true,
            "createAlias": true,
            "destroyAlias": true,
            "createFoundry": true,
            "destroyFoundry": true,
            "mintNativeToken": true,
            "meltNativeToken": true,
            "createNFT": true,
            "destroyNFT": true
        }
    },
    "restAPI": {
        "enabled": true,
        "bindAddress": "0.0.0.0:14265"
    }
}
muXxer commented 1 year ago

How much TPS do you produce? And why is that a bug? So far everything works as expected. There is no known memory leak. If you are creating more load that your server/node can handle, then that's your problem ;)

MLStoltzenburg commented 1 year ago

The spammer just connect and memory consumption starts to increase. It didn't send any message.

Yes! It's not necessary a bug in hornet! I presupposed as bug because in the docker is working very well. Sorry! :-) I used for days on docker before I start to refactor the scripts.

If you want more evidence, I'll be happy to help.

I have been using Hornet 1.2 on Kubernetes for a long time and it works great in my environment.

MLStoltzenburg commented 1 year ago

I made a short film showing the behavior! Hope this helps!

https://github.com/iotaledger/hornet/assets/2595026/beaf6456-5259-4c99-b2be-d86008d666f3

MLStoltzenburg commented 1 year ago

Hi @muXxer

I found the problem, I entered the wrong address in the indexer. I configured restApi.bindAddress with 0.0.0.0, so in this case when Spammer tries to connect to the indexer with 0.0.0.0 which is a wrong endpoint, after placing the correct endpoint the memory consumption problem did not occur.

I corrected my script, but in your opinion, is this a possible problem in the Hornet?

Thanks!