Closed DavidNix closed 1 year ago
Finally remembered to copy one.
apiVersion: v1
kind: Pod
metadata:
annotations:
app.kubernetes.io/ordinal: "0"
seccomp.security.alpha.kubernetes.io/pod: runtime/default
creationTimestamp: "2023-01-30T18:04:30Z"
labels:
app.kubernetes.io/component: CosmosFullNode
app.kubernetes.io/created-by: cosmos-operator
app.kubernetes.io/instance: juno-mainnet-fullnode-0
app.kubernetes.io/name: juno-mainnet-fullnode
app.kubernetes.io/revision: 4f3ef332
app.kubernetes.io/version: v11.0.0
cosmos.strange.love/network: mainnet
name: juno-mainnet-fullnode-0
namespace: strangelove
ownerReferences:
- apiVersion: cosmos.strange.love/v1
blockOwnerDeletion: true
controller: true
kind: CosmosFullNode
name: juno-mainnet-fullnode
uid: aa1fd035-e04e-47a6-93ba-8ab1c4b83801
resourceVersion: "102236387"
uid: 9681c76a-8907-4029-85ce-92fc8f1ed08f
spec:
containers:
- args:
- start
- --home
- /home/operator/cosmos
- --x-crisis-skip-assert-invariants
command:
- junod
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imagePullPolicy: IfNotPresent
name: node
ports:
- containerPort: 1317
name: api
protocol: TCP
- containerPort: 8080
name: rosetta
protocol: TCP
- containerPort: 9090
name: grpc
protocol: TCP
- containerPort: 26660
name: prometheus
protocol: TCP
- containerPort: 26656
name: p2p
protocol: TCP
- containerPort: 26657
name: rpc
protocol: TCP
- containerPort: 9091
name: grpc-web
protocol: TCP
readinessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 26657
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: "1"
memory: 12Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-qklwb
readOnly: true
workingDir: /home/operator
- command:
- ihc
image: ghcr.io/strangelove-ventures/ignite-health-check:v0.0.1
imagePullPolicy: IfNotPresent
name: healthcheck
ports:
- containerPort: 1251
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 1251
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 5m
memory: 16Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-qklwb
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- args:
- -c
- "\nset -eu\nif [ ! -d \"$CHAIN_HOME/data\" ]; then\n\techo \"Initializing chain...\"\n\tjunod
init juno-mainnet-fullnode-0 --chain-id juno-1 --home \"$CHAIN_HOME\"\n\t# Remove
because downstream containers check the presence of this file.\n\trm \"$GENESIS_FILE\"\nelse\n\techo
\"Skipping chain init; already initialized.\"\nfi\n\necho \"Initializing into
tmp dir for downstream processing...\"\njunod init juno-mainnet-fullnode-0 --chain-id
juno-1 --home \"$HOME/.tmp\"\n"
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imagePullPolicy: IfNotPresent
name: chain-init
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-qklwb
readOnly: true
workingDir: /home/operator
- args:
- -c
- "if [ -f \"$GENESIS_FILE\" ]; then\n\techo \"Genesis file $GENESIS_FILE already
exists; skipping initialization.\"\n\texit 0\nfi\n\nset -eu\n\n# $GENESIS_FILE
and $CONFIG_DIR already set via pod env vars.\n\nGENESIS_URL=\"$1\"\n\necho
\"Downloading genesis file $GENESIS_URL to $GENESIS_FILE...\"\n\ndownload_json()
{\n echo \"Downloading plain json...\"\n wget -c -O \"$GENESIS_FILE\" \"$GENESIS_URL\"\n}\n\ndownload_jsongz()
{\n echo \"Downloading json.gz...\"\n wget -c -O - \"$GENESIS_URL\" | gunzip
-c > \"$GENESIS_FILE\"\n}\n\ndownload_tar() {\n echo \"Downloading and extracting
tar...\"\n wget -c -O - \"$GENESIS_URL\" | tar -x -C \"$CONFIG_DIR\"\n}\n\ndownload_targz()
{\n echo \"Downloading and extracting compressed tar...\"\n wget -c -O - \"$GENESIS_URL\"
| tar -xz -C \"$CONFIG_DIR\"\n}\n\ndownload_zip() {\n echo \"Downloading and
extracting zip...\"\n wget -c -O tmp_genesis.zip \"$GENESIS_URL\"\n unzip
tmp_genesis.zip\n rm tmp_genesis.zip\n mv genesis.json \"$GENESIS_FILE\"\n}\n\nrm
-f \"$GENESIS_FILE\"\n\ncase \"$GENESIS_URL\" in\n *.json.gz) download_jsongz
;;\n *.json) download_json ;;\n *.tar.gz) download_targz ;;\n *.tar.gzip)
download_targz ;;\n *.tar) download_tar ;;\n *.zip) download_zip ;;\n *)
echo \"Unable to handle file extension for $GENESIS_URL\"; exit 1 ;;\nesac\n\necho
\"Saved genesis file to $GENESIS_FILE.\"\necho \"Download genesis file complete.\"\n\necho
\"Genesis $GENESIS_FILE initialized.\"\n"
- -s
- https://download.dimi.sh/juno-phoenix2-genesis.tar.gz
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imagePullPolicy: IfNotPresent
name: genesis-init
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-qklwb
readOnly: true
workingDir: /home/operator
- args:
- -c
- |2
set -eu
CONFIG_DIR="$CHAIN_HOME/config"
TMP_DIR="$HOME/.tmp/config"
OVERLAY_DIR="$HOME/.config"
echo "Merging config..."
set -x
config-merge -f toml "$TMP_DIR/config.toml" "$OVERLAY_DIR/config-overlay.toml" > "$CONFIG_DIR/config.toml"
config-merge -f toml "$TMP_DIR/app.toml" "$OVERLAY_DIR/app-overlay.toml" > "$CONFIG_DIR/app.toml"
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imagePullPolicy: IfNotPresent
name: config-merge
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-qklwb
readOnly: true
workingDir: /home/operator
- args:
- -c
- "set -eu\nif test -n \"$(find $DATA_DIR -maxdepth 1 -name '*.db' -print -quit)\";
then\n\techo \"Databases in $DATA_DIR already exists; skipping initialization.\"\n\texit
0\nfi\n\nset -eu\n\n# $CHAIN_HOME already set via pod env vars.\n\nSNAPSHOT_URL=\"$1\"\n\necho
\"Downloading snapshot archive $SNAPSHOT_URL to $CHAIN_HOME...\"\n\ndownload_tar()
{\n echo \"Downloading and extracting tar...\"\n wget -c -O - \"$SNAPSHOT_URL\"
| tar -x -C \"$CHAIN_HOME\"\n}\n\ndownload_targz() {\n echo \"Downloading and
extracting compressed tar...\"\n wget -c -O - \"$SNAPSHOT_URL\" | tar -xz -C
\"$CHAIN_HOME\"\n}\n\ndownload_lz4() {\n echo \"Downloading and extracting
lz4...\"\n wget -c -O - \"$SNAPSHOT_URL\" | lz4 -c -d | tar -x -C \"$CHAIN_HOME\"\n}\n\ncase
\"$SNAPSHOT_URL\" in\n *.tar.lz4) download_lz4 ;;\n *.tar.gzip) download_targz
;;\n *.tar.gz) download_targz ;;\n *.tar) download_tar ;;\n *) echo \"Unable
to handle file extension for $SNAPSHOT_URL\"; exit 1 ;;\nesac\n\necho \"Download
and extract snapshot complete.\"\n\necho \"$DATA_DIR initialized.\"\n"
- -s
- https://snapshots.polkachu.com/snapshots/juno/juno_5373433.tar.lz4
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imagePullPolicy: IfNotPresent
name: snapshot-restore
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-qklwb
readOnly: true
workingDir: /home/operator
nodeName: gke-juno-mainnet-full-chain-node-pool-b2985e35-w530
preemptionPolicy: PreemptLowerPriority
priority: 0
readinessGates:
- conditionType: cloud.google.com/load-balancer-neg-ready
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1025
fsGroupChangePolicy: OnRootMismatch
runAsGroup: 1025
runAsNonRoot: true
runAsUser: 1025
seccompProfile:
type: RuntimeDefault
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: vol-chain-home
persistentVolumeClaim:
claimName: pvc-juno-mainnet-fullnode-0
- emptyDir: {}
name: vol-tmp
- configMap:
defaultMode: 420
items:
- key: config-overlay.toml
path: config-overlay.toml
- key: app-overlay.toml
path: app-overlay.toml
name: juno-mainnet-fullnode-0
name: vol-config
- name: kube-api-access-qklwb
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: null
message: 'Pod has become Healthy in NEG "Key{\"k8s1-d906fa4e-strangelove-juno-mainnet-fullnode-r-2665-e6f07b34\",
zone: \"us-east1-d\"}" attached to BackendService "Key{\"k8s1-d906fa4e-strangelove-juno-mainnet-fullnode-r-2665-e6f07b34\"}".
Marking condition "cloud.google.com/load-balancer-neg-ready" to True.'
reason: LoadBalancerNegReady
status: "True"
type: cloud.google.com/load-balancer-neg-ready
- lastProbeTime: null
lastTransitionTime: "2023-01-30T18:04:40Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-01-30T22:34:27Z"
reason: PodFailed
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-01-30T22:34:27Z"
reason: PodFailed
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-01-30T18:04:30Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: ghcr.io/strangelove-ventures/ignite-health-check:v0.0.1
imageID: ""
lastState:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was deleted. The
container used to be Running
reason: ContainerStatusUnknown
startedAt: null
name: healthcheck
ready: false
restartCount: 1
started: false
state:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was terminated
reason: ContainerStatusUnknown
startedAt: null
- containerID: containerd://5508864cbe02118c152283f3755a4f0bf6dfa23e4b1d46be0994b28d743a6bf1
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imageID: ghcr.io/strangelove-ventures/heighliner/juno@sha256:f65390b4383bdde4ae37e9b712e42b595dea98cdf9b1622450cd564c1b544ebf
lastState: {}
name: node
ready: false
restartCount: 1
started: false
state:
terminated:
containerID: containerd://5508864cbe02118c152283f3755a4f0bf6dfa23e4b1d46be0994b28d743a6bf1
exitCode: 137
finishedAt: "2023-01-30T22:34:27Z"
reason: OOMKilled
startedAt: "2023-01-30T19:47:18Z"
hostIP: 192.168.5.3
initContainerStatuses:
- containerID: containerd://f5a895b64fe59e598cdaed578782dc2edfe214d3b0e78efd80f57c998bc56829
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imageID: ghcr.io/strangelove-ventures/heighliner/juno@sha256:f65390b4383bdde4ae37e9b712e42b595dea98cdf9b1622450cd564c1b544ebf
lastState: {}
name: chain-init
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://f5a895b64fe59e598cdaed578782dc2edfe214d3b0e78efd80f57c998bc56829
exitCode: 0
finishedAt: "2023-01-30T18:04:36Z"
reason: Completed
startedAt: "2023-01-30T18:04:36Z"
- containerID: containerd://5f001f9bae66df709d99ced2f05a6e009823c70d2dc0d64388e300e8026ddc14
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imageID: ghcr.io/strangelove-ventures/infra-toolkit@sha256:3aecfa18d9f0d730fd8821a7f556ba89dbb8cb3683a5e6ac11ec272884db8776
lastState: {}
name: genesis-init
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://5f001f9bae66df709d99ced2f05a6e009823c70d2dc0d64388e300e8026ddc14
exitCode: 0
finishedAt: "2023-01-30T18:04:37Z"
reason: Completed
startedAt: "2023-01-30T18:04:37Z"
- containerID: containerd://a46fa8e1651059d56cab72ccaab800dbe356f2666767610a8d8154180e3f348e
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imageID: ghcr.io/strangelove-ventures/infra-toolkit@sha256:3aecfa18d9f0d730fd8821a7f556ba89dbb8cb3683a5e6ac11ec272884db8776
lastState: {}
name: config-merge
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://a46fa8e1651059d56cab72ccaab800dbe356f2666767610a8d8154180e3f348e
exitCode: 0
finishedAt: "2023-01-30T18:04:39Z"
reason: Completed
startedAt: "2023-01-30T18:04:38Z"
- containerID: containerd://dc8fac97f26ae0f039f921d037d280051ef5f6ad4771ff051d613b4069166ee8
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imageID: ghcr.io/strangelove-ventures/infra-toolkit@sha256:3aecfa18d9f0d730fd8821a7f556ba89dbb8cb3683a5e6ac11ec272884db8776
lastState: {}
name: snapshot-restore
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://dc8fac97f26ae0f039f921d037d280051ef5f6ad4771ff051d613b4069166ee8
exitCode: 0
finishedAt: "2023-01-30T18:04:39Z"
reason: Completed
startedAt: "2023-01-30T18:04:39Z"
message: 'The node was low on resource: memory. Container node was using 35505620Ki,
which exceeds its request of 12Gi. '
phase: Failed
podIP: 10.7.0.103
podIPs:
- ip: 10.7.0.103
qosClass: Burstable
reason: Evicted
startTime: "2023-01-30T18:04:30Z"
Another example, just in case there's differences.
apiVersion: v1
kind: Pod
metadata:
annotations:
app.kubernetes.io/ordinal: "5"
seccomp.security.alpha.kubernetes.io/pod: runtime/default
creationTimestamp: "2023-01-31T21:03:10Z"
labels:
app.kubernetes.io/component: CosmosFullNode
app.kubernetes.io/created-by: cosmos-operator
app.kubernetes.io/instance: juno-mainnet-fullnode-5
app.kubernetes.io/name: juno-mainnet-fullnode
app.kubernetes.io/revision: 4f3ef332
app.kubernetes.io/version: v11.0.0
cosmos.strange.love/network: mainnet
name: juno-mainnet-fullnode-5
namespace: strangelove
ownerReferences:
- apiVersion: cosmos.strange.love/v1
blockOwnerDeletion: true
controller: true
kind: CosmosFullNode
name: juno-mainnet-fullnode
uid: aa1fd035-e04e-47a6-93ba-8ab1c4b83801
resourceVersion: "105075022"
uid: 6fd95328-264d-4730-b47c-3bf808e89850
spec:
containers:
- args:
- start
- --home
- /home/operator/cosmos
- --x-crisis-skip-assert-invariants
command:
- junod
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imagePullPolicy: IfNotPresent
name: node
ports:
- containerPort: 1317
name: api
protocol: TCP
- containerPort: 8080
name: rosetta
protocol: TCP
- containerPort: 9090
name: grpc
protocol: TCP
- containerPort: 26660
name: prometheus
protocol: TCP
- containerPort: 26656
name: p2p
protocol: TCP
- containerPort: 26657
name: rpc
protocol: TCP
- containerPort: 9091
name: grpc-web
protocol: TCP
readinessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 26657
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: "1"
memory: 12Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-shnpw
readOnly: true
workingDir: /home/operator
- command:
- ihc
image: ghcr.io/strangelove-ventures/ignite-health-check:v0.0.1
imagePullPolicy: IfNotPresent
name: healthcheck
ports:
- containerPort: 1251
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 1251
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 5m
memory: 16Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-shnpw
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- args:
- -c
- "\nset -eu\nif [ ! -d \"$CHAIN_HOME/data\" ]; then\n\techo \"Initializing chain...\"\n\tjunod
init juno-mainnet-fullnode-5 --chain-id juno-1 --home \"$CHAIN_HOME\"\n\t# Remove
because downstream containers check the presence of this file.\n\trm \"$GENESIS_FILE\"\nelse\n\techo
\"Skipping chain init; already initialized.\"\nfi\n\necho \"Initializing into
tmp dir for downstream processing...\"\njunod init juno-mainnet-fullnode-5 --chain-id
juno-1 --home \"$HOME/.tmp\"\n"
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imagePullPolicy: IfNotPresent
name: chain-init
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-shnpw
readOnly: true
workingDir: /home/operator
- args:
- -c
- "if [ -f \"$GENESIS_FILE\" ]; then\n\techo \"Genesis file $GENESIS_FILE already
exists; skipping initialization.\"\n\texit 0\nfi\n\nset -eu\n\n# $GENESIS_FILE
and $CONFIG_DIR already set via pod env vars.\n\nGENESIS_URL=\"$1\"\n\necho
\"Downloading genesis file $GENESIS_URL to $GENESIS_FILE...\"\n\ndownload_json()
{\n echo \"Downloading plain json...\"\n wget -c -O \"$GENESIS_FILE\" \"$GENESIS_URL\"\n}\n\ndownload_jsongz()
{\n echo \"Downloading json.gz...\"\n wget -c -O - \"$GENESIS_URL\" | gunzip
-c > \"$GENESIS_FILE\"\n}\n\ndownload_tar() {\n echo \"Downloading and extracting
tar...\"\n wget -c -O - \"$GENESIS_URL\" | tar -x -C \"$CONFIG_DIR\"\n}\n\ndownload_targz()
{\n echo \"Downloading and extracting compressed tar...\"\n wget -c -O - \"$GENESIS_URL\"
| tar -xz -C \"$CONFIG_DIR\"\n}\n\ndownload_zip() {\n echo \"Downloading and
extracting zip...\"\n wget -c -O tmp_genesis.zip \"$GENESIS_URL\"\n unzip
tmp_genesis.zip\n rm tmp_genesis.zip\n mv genesis.json \"$GENESIS_FILE\"\n}\n\nrm
-f \"$GENESIS_FILE\"\n\ncase \"$GENESIS_URL\" in\n *.json.gz) download_jsongz
;;\n *.json) download_json ;;\n *.tar.gz) download_targz ;;\n *.tar.gzip)
download_targz ;;\n *.tar) download_tar ;;\n *.zip) download_zip ;;\n *)
echo \"Unable to handle file extension for $GENESIS_URL\"; exit 1 ;;\nesac\n\necho
\"Saved genesis file to $GENESIS_FILE.\"\necho \"Download genesis file complete.\"\n\necho
\"Genesis $GENESIS_FILE initialized.\"\n"
- -s
- https://download.dimi.sh/juno-phoenix2-genesis.tar.gz
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imagePullPolicy: IfNotPresent
name: genesis-init
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-shnpw
readOnly: true
workingDir: /home/operator
- args:
- -c
- |2
set -eu
CONFIG_DIR="$CHAIN_HOME/config"
TMP_DIR="$HOME/.tmp/config"
OVERLAY_DIR="$HOME/.config"
echo "Merging config..."
set -x
config-merge -f toml "$TMP_DIR/config.toml" "$OVERLAY_DIR/config-overlay.toml" > "$CONFIG_DIR/config.toml"
config-merge -f toml "$TMP_DIR/app.toml" "$OVERLAY_DIR/app-overlay.toml" > "$CONFIG_DIR/app.toml"
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imagePullPolicy: IfNotPresent
name: config-merge
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-shnpw
readOnly: true
workingDir: /home/operator
- args:
- -c
- "set -eu\nif test -n \"$(find $DATA_DIR -maxdepth 1 -name '*.db' -print -quit)\";
then\n\techo \"Databases in $DATA_DIR already exists; skipping initialization.\"\n\texit
0\nfi\n\nset -eu\n\n# $CHAIN_HOME already set via pod env vars.\n\nSNAPSHOT_URL=\"$1\"\n\necho
\"Downloading snapshot archive $SNAPSHOT_URL to $CHAIN_HOME...\"\n\ndownload_tar()
{\n echo \"Downloading and extracting tar...\"\n wget -c -O - \"$SNAPSHOT_URL\"
| tar -x -C \"$CHAIN_HOME\"\n}\n\ndownload_targz() {\n echo \"Downloading and
extracting compressed tar...\"\n wget -c -O - \"$SNAPSHOT_URL\" | tar -xz -C
\"$CHAIN_HOME\"\n}\n\ndownload_lz4() {\n echo \"Downloading and extracting
lz4...\"\n wget -c -O - \"$SNAPSHOT_URL\" | lz4 -c -d | tar -x -C \"$CHAIN_HOME\"\n}\n\ncase
\"$SNAPSHOT_URL\" in\n *.tar.lz4) download_lz4 ;;\n *.tar.gzip) download_targz
;;\n *.tar.gz) download_targz ;;\n *.tar) download_tar ;;\n *) echo \"Unable
to handle file extension for $SNAPSHOT_URL\"; exit 1 ;;\nesac\n\necho \"Download
and extract snapshot complete.\"\n\necho \"$DATA_DIR initialized.\"\n"
- -s
- https://snapshots.polkachu.com/snapshots/juno/juno_5373433.tar.lz4
command:
- sh
env:
- name: HOME
value: /home/operator
- name: CHAIN_HOME
value: /home/operator/cosmos
- name: GENESIS_FILE
value: /home/operator/cosmos/config/genesis.json
- name: CONFIG_DIR
value: /home/operator/cosmos/config
- name: DATA_DIR
value: /home/operator/cosmos/data
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imagePullPolicy: IfNotPresent
name: snapshot-restore
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /home/operator/cosmos
name: vol-chain-home
- mountPath: /home/operator/.tmp
name: vol-tmp
- mountPath: /home/operator/.config
name: vol-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-shnpw
readOnly: true
workingDir: /home/operator
nodeName: gke-juno-mainnet-full-chain-node-pool-b2985e35-w530
preemptionPolicy: PreemptLowerPriority
priority: 0
readinessGates:
- conditionType: cloud.google.com/load-balancer-neg-ready
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1025
fsGroupChangePolicy: OnRootMismatch
runAsGroup: 1025
runAsNonRoot: true
runAsUser: 1025
seccompProfile:
type: RuntimeDefault
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: vol-chain-home
persistentVolumeClaim:
claimName: pvc-juno-mainnet-fullnode-5
- emptyDir: {}
name: vol-tmp
- configMap:
defaultMode: 420
items:
- key: config-overlay.toml
path: config-overlay.toml
- key: app-overlay.toml
path: app-overlay.toml
name: juno-mainnet-fullnode-5
name: vol-config
- name: kube-api-access-shnpw
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: null
message: 'Pod has become Healthy in NEG "Key{\"k8s1-d906fa4e-strangelove-juno-mainnet-fullnode-rp-909-3bd6d742\",
zone: \"us-east1-d\"}" attached to BackendService "Key{\"k8s1-d906fa4e-strangelove-juno-mainnet-fullnode-rp-909-3bd6d742\"}".
Marking condition "cloud.google.com/load-balancer-neg-ready" to True.'
reason: LoadBalancerNegReady
status: "True"
type: cloud.google.com/load-balancer-neg-ready
- lastProbeTime: null
lastTransitionTime: "2023-01-31T21:03:25Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-02-02T17:48:32Z"
reason: PodFailed
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-02-02T17:48:32Z"
reason: PodFailed
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-01-31T21:03:10Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: ghcr.io/strangelove-ventures/ignite-health-check:v0.0.1
imageID: ""
lastState:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was deleted. The
container used to be Running
reason: ContainerStatusUnknown
startedAt: null
name: healthcheck
ready: false
restartCount: 1
started: false
state:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was terminated
reason: ContainerStatusUnknown
startedAt: null
- containerID: containerd://25229be205aae5f8008bb9bf0b0731176663c495bfe04460492a72a61db33724
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imageID: ghcr.io/strangelove-ventures/heighliner/juno@sha256:f65390b4383bdde4ae37e9b712e42b595dea98cdf9b1622450cd564c1b544ebf
lastState: {}
name: node
ready: false
restartCount: 1
started: false
state:
terminated:
containerID: containerd://25229be205aae5f8008bb9bf0b0731176663c495bfe04460492a72a61db33724
exitCode: 137
finishedAt: "2023-02-02T17:48:31Z"
reason: OOMKilled
startedAt: "2023-02-01T12:36:15Z"
hostIP: 192.168.5.3
initContainerStatuses:
- containerID: containerd://cd099063efc6fc6c07c8242b467d6fbd55aca853b4c470274be5353527aee93a
image: ghcr.io/strangelove-ventures/heighliner/juno:v11.0.0
imageID: ghcr.io/strangelove-ventures/heighliner/juno@sha256:f65390b4383bdde4ae37e9b712e42b595dea98cdf9b1622450cd564c1b544ebf
lastState: {}
name: chain-init
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://cd099063efc6fc6c07c8242b467d6fbd55aca853b4c470274be5353527aee93a
exitCode: 0
finishedAt: "2023-01-31T21:03:20Z"
reason: Completed
startedAt: "2023-01-31T21:03:20Z"
- containerID: containerd://fba829126cbb0edeade19e43740a3fbf601e27dfc3a9d0781a8abdc04c4cb723
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imageID: ghcr.io/strangelove-ventures/infra-toolkit@sha256:3aecfa18d9f0d730fd8821a7f556ba89dbb8cb3683a5e6ac11ec272884db8776
lastState: {}
name: genesis-init
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://fba829126cbb0edeade19e43740a3fbf601e27dfc3a9d0781a8abdc04c4cb723
exitCode: 0
finishedAt: "2023-01-31T21:03:21Z"
reason: Completed
startedAt: "2023-01-31T21:03:21Z"
- containerID: containerd://7c8f483695363bc7ca7b4710b8a162a0a5ee233a320944bf52ae0f7f98da647e
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imageID: ghcr.io/strangelove-ventures/infra-toolkit@sha256:3aecfa18d9f0d730fd8821a7f556ba89dbb8cb3683a5e6ac11ec272884db8776
lastState: {}
name: config-merge
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://7c8f483695363bc7ca7b4710b8a162a0a5ee233a320944bf52ae0f7f98da647e
exitCode: 0
finishedAt: "2023-01-31T21:03:23Z"
reason: Completed
startedAt: "2023-01-31T21:03:22Z"
- containerID: containerd://66c6157d18d5f6f427afeaead27ea52a012b51ba778d9fc12cd9099b2c922d34
image: ghcr.io/strangelove-ventures/infra-toolkit:v0.0.1
imageID: ghcr.io/strangelove-ventures/infra-toolkit@sha256:3aecfa18d9f0d730fd8821a7f556ba89dbb8cb3683a5e6ac11ec272884db8776
lastState: {}
name: snapshot-restore
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://66c6157d18d5f6f427afeaead27ea52a012b51ba778d9fc12cd9099b2c922d34
exitCode: 0
finishedAt: "2023-01-31T21:03:24Z"
reason: Completed
startedAt: "2023-01-31T21:03:24Z"
message: 'The node was low on resource: memory. Container node was using 30684Mi,
which exceeds its request of 12Gi. '
phase: Failed
podIP: 10.7.0.107
podIPs:
- ip: 10.7.0.107
qosClass: Burstable
reason: Evicted
startTime: "2023-01-31T21:03:10Z"
Juno, specifically, may be under resourced.
I feel adding a feature to the operator is treating the symptom and not the cause. This indicates an issue with the cluster (not the cosmos node).
The only helpful advice I found was from https://github.com/kubernetes/kubernetes/issues/43279. This thread indicates the k8s node could become unresponsive to the kubelet if the node exhausts its memory.
Through working with the GCP support team, we figured out that the issue was triggered by not having memory limits on the pods, which was causing the oomkiller to run on the servers, sometimes killing processes it shouldn't. Even worse, the scheduler rescheduled these troublesome pods on other nodes, effectually poisoning the entire cluster. This is definitely something that should be prevented, but can at least be mitigated by setting default memory limits and making sure the limits on your pods are not too high.
So we'll test a change with our Juno deployment config and observe.
Juno seems to run fine at around ~12GB memory. Perhaps it spikes during a restart. For us, this problem only occurs with Juno.
I've seen this a few times now. Rebooting the pod fixes the issue. I'm not sure the root cause.