Open maris-jurgenbergs opened 3 years ago
@maris-jurgenbergs I'd like to dive a little deeper into this. Can you give us some more information:
@robbavey About regularity i checked the logs and some errors are not appearing and some are regular still (logs are from 29th March ~20:00 till 30th March ~07:12).
throwing away zombie pump error was not found anymore, this one seems to be the one that is not a regular error.
Transient exceptions error was not found anymore, this one seems to be the one that is not a regular error.
Failure updating checkpoint is regular:
Line 61: 2021-03-29T20:04:19.474444995Z [2021-03-29T20:04:19,473][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 209: 2021-03-30T00:26:10.226013250Z [2021-03-30T00:26:10,225][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 357: 2021-03-30T00:26:13.250360156Z [2021-03-30T00:26:13,249][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 505: 2021-03-30T00:26:16.254747600Z [2021-03-30T00:26:16,253][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 653: 2021-03-30T03:00:21.106969931Z [2021-03-30T03:00:21,106][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 801: 2021-03-30T03:00:24.145414175Z [2021-03-30T03:00:24,144][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 949: 2021-03-30T03:00:27.150838231Z [2021-03-30T03:00:27,149][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 1097: 2021-03-30T03:10:52.876518821Z [2021-03-30T03:10:52,875][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 1393: 2021-03-30T03:10:55.921582021Z [2021-03-30T03:10:55,920][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 1541: 2021-03-30T03:10:58.926171658Z [2021-03-30T03:10:58,925][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 1689: 2021-03-30T03:31:20.932470064Z [2021-03-30T03:31:20,930][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 1837: 2021-03-30T03:31:23.957576919Z [2021-03-30T03:31:23,956][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 1985: 2021-03-30T03:31:26.961625942Z [2021-03-30T03:31:26,960][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 2133: 2021-03-30T04:01:20.913197990Z [2021-03-30T04:01:20,912][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 2281: 2021-03-30T04:01:23.938102725Z [2021-03-30T04:01:23,936][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 2429: 2021-03-30T04:01:26.941896427Z [2021-03-30T04:01:26,941][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 2577: 2021-03-30T04:17:50.999614850Z [2021-03-30T04:17:50,984][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 2725: 2021-03-30T04:17:54.045863465Z [2021-03-30T04:17:54,045][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 3021: 2021-03-30T04:17:57.051245297Z [2021-03-30T04:17:57,050][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 3169: 2021-03-30T04:45:13.444860276Z [2021-03-30T04:45:13,444][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 3317: 2021-03-30T04:45:16.509612584Z [2021-03-30T04:45:16,508][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 3465: 2021-03-30T04:45:19.534149947Z [2021-03-30T04:45:19,533][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
Line 3613: 2021-03-30T06:54:13.524280222Z [2021-03-30T06:54:13,523][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 3761: 2021-03-30T06:54:16.568499221Z [2021-03-30T06:54:16,567][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 0: Failure updating checkpoint
Line 3909: 2021-03-30T06:54:19.572635984Z [2021-03-30T06:54:19,571][WARN ][com.microsoft.azure.eventprocessorhost.AzureStorageCheckpointLeaseManager][some-pipeline][be5beb4e66e8dfc17d1aa030b49d1865b73ad30bc4db671ddfd5119aa2125959] host logstash-some-guid: 1: Failure updating checkpoint
The update error is caused by StorageException timeouts:
Caused by: com.microsoft.azure.eventprocessorhost.ExceptionWithAction: com.microsoft.azure.storage.StorageException: The client could not finish the operation within specified maximum execution timeout.
higher epoch error was not found anymore, this one seems to be the one that is not a regular error.
We are using elastic cloud and this is the kubernetes yaml with logstash config and other parts:
replicas: 1
logstashConfig:
logstash.yml: |
node.name: somenodename
http.host: 0.0.0.0
http.port: 9600
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.hosts: ['https://somehost.westeurope.azure.elastic-cloud.com:someport/']
xpack.monitoring.elasticsearch.username: '${USR}'
xpack.monitoring.elasticsearch.password: '${PW}'
###xpack.monitoring.elasticsearch.ssl.certificate_authority: /usr/some.crt
pipelines.yml: |
- pipeline.id: some-pipeline-id
path.config: "/somepath/config.conf"
pipeline.workers: 2
pipeline.batch.size: 300
# Allows you to add any pipeline files in /usr/share/logstash/pipeline/
### ***warn*** there is a hardcoded logstash.conf in the image, override it first
logstashPipeline:
some-pipeline.conf: |
input {
azure_event_hubs {
event_hub_connections => ["Endpoint=sb://someeventhub.servicebus.windows.net/;SharedAccessKeyName=Listen;SharedAccessKey=${KEY};EntityPath=hub-name"]
threads => 2
storage_connection => "DefaultEndpointsProtocol=https;AccountName=somestorage;AccountKey=${SOMEKEY};EndpointSuffix=core.windows.net"
checkpoint_interval => 60
max_batch_size => 300
}
}
filter {
json {
source => "message"
}
date {
match => ["[header][timestamp]", "ISO8601"]
remove_field => ["[header][timestamp]"]
}
if [header][pri][severity] == 7 {
mutate {add_field => {"[@metadata][es_suffix]" => "-debug"}}
}
else {
mutate {add_field => {"[@metadata][es_suffix]" => ""}}
}
if [header][pri][severity] == 8 {
drop {}
}
mutate {
remove_field => [ "message" ]
}
}
output {
elasticsearch {
hosts => 'https://somehost.westeurope.azure.elastic-cloud.com:someport/'
ssl => true
user => '${USR}'
password => '${PW}'
index => 'indexname'
}
}
image: "someimagename"
imageTag: "7.10.0"
imagePullPolicy: "IfNotPresent"
imagePullSecrets: []
logstashJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "100m"
memory: "1536Mi"
limits:
cpu: "1000m"
memory: "1536Mi"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
enabled: false
annotations: {}
extraVolumes: ""
# - name: extras
# emptyDir: {}
extraVolumeMounts: ""
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraContainers: ""
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: ""
# - name: do-something
# image: busybox
# command: ['do', 'something']
# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"
# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"
# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
httpPort: 9600
# Custom ports to add to logstash
extraPorts: []
# - name: beats
# containerPort: 5001
updateStrategy: RollingUpdate
# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
# readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
# How long to wait for logstash to stop gracefully
terminationGracePeriod: 120
# Probes
# Default probes are using `httpGet` which requires that `http.host: 0.0.0.0` is part of
# `logstash.yml`. If needed probes can be disabled or overrided using the following syntaxes:
#
# disable livenessProbe
# livenessProbe: null
#
# replace httpGet default readinessProbe by some exec probe
# readinessProbe:
# httpGet: null
# exec:
# command:
# - curl
# - localhost:9600
livenessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 300
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
readinessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 3
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
nodeSelector: {}
tolerations: []
nameOverride: ""
fullnameOverride: ""
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo '10.162.34.205' >> /etc/hosts"]
# postStart:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
service:
annotations: {}
type: ClusterIP
ports:
- name: logstash-logstash
port: 9600
protocol: TCP
targetPort: 9600
# - name: http
# port: 8080
# protocol: TCP
# targetPort: 8080
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: internal-nginx
hosts:
- host: somehost
paths:
- path: /
servicePort: 9600
#- path: /logs
# servicePort: 8080
tls:
- hosts:
- somehost
One instance of logstash is running.
Hi,
We are also getting the same warnings with our logstash event hub. Any idea how to fix this?
Hi,
We are noticing frequent Link detach errors and connection inactive timeouts in Logstash logs. This is failing logstash to read from specific partitions in Event Hub. Can you please suggest a workaround or fix for this issue?
Error Logs
errorDescription[The connection was inactive for more than the allowed 60000 milliseconds and is closed by container 'LinkTracker'. TrackingId:c1bd39c330e64cc5a2489bdc72efcc6d_G5, SystemTracker:gateway7, Timestamp:2022-06-28T23:53:57]
[2022-06-28T23:50:17,290][WARN ][com.microsoft.azure.eventhubs.impl.MessageReceiver][pipeline][alfred-logger] clientId[PR_339be5_1656001447373_MF_45774c_1656001447210-InternalReceiver], receiverPath[alfred-logging/ConsumerGroups/logstash/Partitions/9], linkName[LN_ec3fc7_1656458802212_b68_G21], onError: com.microsoft.azure.eventhubs.EventHubException: com.microsoft.azure.eventhubs.impl.AmqpException: The link 'LN_ec3fc7_1656458802212_b68_G21' is force detached. Code: RenewToken. Details: Unauthorized access. 'Listen' claim(s) are required to perform this operation. Resource: 'sb://ehn-test-01.servicebus.windows.net/alfred-logging/consumergroups/logstash/partitions/9'.. TrackingId:ad27470d0fc34ebaaa4f8826c23d6b68_G21, SystemTracker:gateway7, Timestamp:2022-06-28T23:51:41
We get different errors and warnings, that bloat our logstash logs. Getting rid of them would be best, but i am troubleshooting the cause of them here, because the logstash plugin is the one handling these errors/warnings.
I understand this is from the microsoft library, but does this not mean that the azure event hub plugin is not correctly executed or set up?
We get a transient storage failures, but logstash catches them as errors, but it should ignored them since they are just info level.
We are getting these very periodic. Either would be nice to increase the timeout or skip logs like this. They really polute the log files.
We also get these epoch errors. Is this error somehow created by multiple threads in the azure event hub plugin?