Open ZakMiller opened 3 years ago
I originally mentioned that the problem could have been suggested by the logs.
We were seeing this:
{"@timestamp":"2021-10-18T19:30:20Z","@service":"benthos","component":"benthos.input","level":"DEBUG","message":"Starting consumer group"}
rather than this:
{"@timestamp":"2021-10-22T19:06:25Z","@service":"benthos","component":"benthos.input","level":"DEBUG","message":"Starting consumer group"}
{"@timestamp":"2021-10-22T19:06:25Z","@service":"benthos","component":"benthos.input","level":"DEBUG","message":"Consuming messages from topic 'cluster_logs' partition '0'"}
I now think that's a red herring. We have a few benthos instances acting as redpanda consumers on different topics and I'm seeing different behavior with the logging (with the same log levels). When I deleted the redpanda pod and it restarted I saw log messages without the "Consuming messages from topic..." and yet it was still consuming messages. So I'm not sure if that's a separate bug or what, but I don't think it's connected.
Thanks for the writeup @ZakMiller! Just to clarify, have you tried to reproduce it and now you're seeing different behaviour? Also, would you mind sharing a simplified Benthos config as well as the config you're using for the RedPanda docker container, so I can try to reproduce this locally using the same setup you have?
I tried to reproduce it by just shutting off redpanda for a period of time (rather than the slightly different behavior at the time, which was it being down for a multi-day period due to the disk filling up).
apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
metadata:
name: redpanda
namespace: redpanda
spec:
image: <redpanda image>
version: "latest"
replicas: 1
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: '2'
memory: 4Gi
configuration:
rpcServer:
port: 33145
kafkaApi:
- port: 9092
pandaproxyApi:
- port: 8082
adminApi:
- port: 9644
autoCreateTopics: true
apiVersion: apps/v1
kind: Deployment
metadata:
name: benthos-stream-events
namespace: dekn-app
labels:
app: benthos-stream-events
spec:
replicas: 1
selector:
matchLabels:
app: benthos-stream-events
template:
metadata:
labels:
app: benthos-stream-events
spec:
containers:
- name: benthos-stream-events
image: <benthos image>
volumeMounts:
- name: benthos-stream-events-conf
mountPath: /benthos.yaml
subPath: benthos.yaml
readOnly: true
volumes:
- name: benthos-stream-events-conf
configMap:
name: benthos-stream-events-config
items:
- key: benthos.yaml
path: benthos.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: benthos-stream-events-config
#https://www.benthos.dev/docs/components/outputs/http_client
data:
benthos.yaml: |-
logger:
level: ALL
input:
kafka:
addresses:
- redpanda.redpanda.svc.cluster.local:9092
topics: [ events ]
consumer_group: benthos_stream_http_events
output:
try:
- http_client:
url: <event-processor-url>
verb: POST
retries: 3
oauth2:
enabled: true
client_key: "${KEYCLOAK_CLIENT_ID}"
client_secret: "${KEYCLOAK_CLIENT_SECRET}"
token_url: <token-url>
headers:
Content-Type: application/json
rate_limit: ""
timeout: 5s
max_in_flight: 1
retry_period: 1s
- kafka:
addresses:
- redpanda.redpanda.svc.cluster.local:9092
topic: async_events_dead
- stdout:
codec: lines
I appreciate the help @mihaitodor
We're using benthos as a redpanda consumer and we recently saw a situation where redpanda ran out of space on disk, causing the (k8s) pod to become unhealthy. It was in this state for a while, I think a few days.
We resolved the issue by resizing the pvc and restarting the kafka pod, and that fixed kafka, but we noticed that benthos wasn't consuming anymore. We fixed it by restarting the pod, but that shouldn't be necessary, right?
I took a look at the docs but didn't see anything that would suggest the behavior in a situation like this.
I opened an issue becuase @mihaitodor suggested it.
Any help would be appreciated!