Open hexiang1219 opened 2 months ago
@hexiang1219 Did you see any logs like -
Alert notification sent, org: {the_org}, module_key: {alert_name}
The above should be printed in the logs in case alert notifications were successfully sent for all the destinations.
If there were any errors sending notifications to any destination, you should see -
Error sending notification for {alert_name} for destination {destination_name} err: {the error itself}
You should also see in case notifications could not be sent to any of the destinations -
Error sending alert notification: org: {}, module_key: {}
Or this in case some of the notifications were sent to some destinations
Some notifications for alert {alert_name} could not be sent: {error message}
Can you please confirm if there were such error messages?
@hexiang1219 Did you see any logs like -
Alert notification sent, org: {the_org}, module_key: {alert_name}
The above should be printed in the logs in case alert notifications were successfully sent for all the destinations.
If there were any errors sending notifications to any destination, you should see -
Error sending notification for {alert_name} for destination {destination_name} err: {the error itself}
You should also see in case notifications could not be sent to any of the destinations -
Error sending alert notification: org: {}, module_key: {}
Or this in case some of the notifications were sent to some destinations
Some notifications for alert {alert_name} could not be sent: {error message}
Can you please confirm if there were such error messages?
alertmanager logs:
message received by destination:
There is no notification log, not even Alert notification sent, and it feels like it is no longer running. Under normal circumstances, if the detection period is configured to be 1 minute, there should be relevant alert logs every 1 minute.
@hexiang1219 This seems to be strange. So you are saying the alert manager is stuck at
Alert condition satisfied, org: cloudnatie, module_key: logs/k8s_apiserver_logs/test123
and there are no more logs after that?
message received by destination:
Sorry, didn't get this part, did your destination receive the alert?
Also, can you try running in debug mode and see if there are any more logs? You can also set -
ZO_USAGE_REPORTING_ENABLED = true
ZO_USAGE_PUBLISH_INTERVAL=120 # The interval in seconds after the usage reporting will be ingested into triggers stream in the _meta org
This will create a _meta
org and there you should see a triggers
stream. You will find the events regarding alerts in that stream.
Also it would be great if you can mention the db you are using i.e. sqlite/postgres/mysql.
@hexiang1219 Can you try v0.11.0-rc3
, it will print logs:
Alert notification sent
could not be sent
Error sending alert notification
@hexiang1219 Can you try
v0.11.0-rc3
, it will print logs:* send success: `Alert notification sent` * send failed: `could not be sent` * other error: `Error sending alert notification`
I have upgraded to v0.11.0-rc3
,Still no effect. it is found that only every time the alertmanager component is restarted. destination will receive a few messages, and after a while nothing will be received.
In addition, I configured two alarm rules and they are always triggered, but every time I look at the log, there is only one piece of information.
Part of the configuration is as follows (I don’t know if it has anything to do with the configuration) OTEL_OTLP_HTTP_ENDPOINT: '' RUST_BACKTRACE: '0' RUST_LOG: info ZO_ACTIX_KEEP_ALIVE: '30' ZO_ACTIX_REQ_TIMEOUT: '30' ZO_ACTIX_SHUTDOWN_TIMEOUT: '10' ZO_APP_NAME: openobserve ZO_BASE_URI: '' ZO_BLOOM_FILTER_DEFAULT_FIELDS: '' ZO_BLOOM_FILTER_ENABLED: 'true' ZO_BLOOM_FILTER_ON_ALL_FIELDS: 'true' ZO_CLUSTER_COORDINATOR: etcd ZO_CLUSTER_NAME: o2 ZO_COLS_PER_RECORD_LIMIT: '600' ZO_COLUMN_TIMESTAMP: _timestamp ZO_COMPACT_BLOCKED_ORGS: '' ZO_COMPACT_DATA_RETENTION_DAYS: '14' ZO_COMPACT_DELETE_FILES_DELAY_HOURS: '2' ZO_COMPACT_ENABLED: 'true' ZO_COMPACT_INTERVAL: '60' ZO_COMPACT_LOOKBACK_HOURS: '0' ZO_COMPACT_MAX_FILE_SIZE: '256' ZO_COMPACT_STEP_SECS: '3600' ZO_COMPACT_SYNC_TO_DB_INTERVAL: '1800' ZO_COOKIE_MAX_AGE: '2592000' ZO_COOKIE_SAME_SITE_LAX: 'true' ZO_COOKIE_SECURE_ONLY: 'false' ZO_DATA_CACHE_DIR: '' ZO_DATA_DB_DIR: '' ZO_DATA_DIR: ./data/ ZO_DATA_IDX_DIR: '' ZO_DATA_STREAM_DIR: '' ZO_DATA_WAL_DIR: '' ZO_DISK_CACHE_ENABLED: 'false' ZO_DISK_CACHE_GC_INTERVAL: '0' ZO_DISK_CACHE_GC_SIZE: '100' ZO_DISK_CACHE_MAX_SIZE: '0' ZO_DISK_CACHE_RELEASE_SIZE: '0' ZO_DISK_CACHE_SKIP_SIZE: '0' ZO_DISK_CACHE_STRATEGY: lru ZO_DISTINCT_VALUES_HOURLY: 'false' ZO_DISTINCT_VALUES_INTERVAL: '10' ZO_ENABLE_INVERTED_INDEX: 'false' ZO_ENRICHMENT_TABLE_LIMIT: '10' ZO_ENTRY_PER_SCHEMA_VERSION_ENABLED: 'true' ZO_ETCD_ADDR: xxx.svc.cluster.local:2379 ZO_ETCD_CERT_FILE: '' ZO_ETCD_CLIENT_CERT_AUTH: 'false' ZO_ETCD_COMMAND_TIMEOUT: '10' ZO_ETCD_CONNECT_TIMEOUT: '10' ZO_ETCD_DOMAIN_NAME: '' ZO_ETCD_KEY_FILE: '' ZO_ETCD_LOAD_PAGE_SIZE: '100' ZO_ETCD_LOCK_WAIT_TIMEOUT: '600' ZO_ETCD_PASSWORD: '' ZO_ETCD_PREFIX: /zinc/observe/ ZO_ETCD_TRUSTED_CA_FILE: '' ZO_ETCD_USER: '' ZO_FEATURE_DISTINCT_EXTRA_FIELDS: '' ZO_FEATURE_FILELIST_DEDUP_ENABLED: 'false' ZO_FEATURE_FULLTEXT_EXTRA_FIELDS: '' ZO_FEATURE_PER_THREAD_LOCK: 'true' ZO_FEATURE_QUERY_INFER_SCHEMA: 'false' ZO_FEATURE_QUERY_PARTITION_STRATEGY: file_num ZO_FEATURE_QUERY_QUEUE_ENABLED: 'true' ZO_FEATURE_QUICK_MODE_FIELDS: '' ZO_FILE_MOVE_THREAD_NUM: '0' ZO_FILE_PUSH_INTERVAL: '10' ZO_FILE_PUSH_LIMIT: '10000' ZO_GRPC_ADDR: '' ZO_GRPC_MAX_MESSAGE_SIZE: '16' ZO_GRPC_ORG_HEADER_KEY: organization ZO_GRPC_PORT: '5081' ZO_GRPC_STREAM_HEADER_KEY: stream-name ZO_GRPC_TIMEOUT: '600' ZO_HTTP_ADDR: '' ZO_HTTP_IPV6_ENABLED: 'false' ZO_HTTP_PORT: '5080' ZO_HTTP_WORKER_MAX_BLOCKING: '0' ZO_HTTP_WORKER_NUM: '0' ZO_IGNORE_FILE_RETENTION_BY_STREAM: 'false' ZO_INGESTER_SERVICE_URL: '' ZO_INGEST_ALLOWED_UPTO: '24' ZO_INGEST_FLATTEN_LEVEL: '3' ZO_INSTANCE_NAME: '' ZO_INTERNAL_GRPC_TOKEN: '' ZO_INVERTED_INDEX_SPLITCHARS: .,;:|/# =-+*^&%$@!~` ZO_JSON_LIMIT: '209715200' ZO_LOCAL_MODE: 'false' ZO_LOCAL_MODE_STORAGE: disk ZO_LOGS_FILE_RETENTION: hourly ZO_MAX_FILE_RETENTION_TIME: '600' ZO_MAX_FILE_SIZE_IN_MEMORY: '256' ZO_MAX_FILE_SIZE_ON_DISK: '64' ZO_MEMORY_CACHE_CACHE_LATEST_FILES: 'false' ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE: '0' ZO_MEMORY_CACHE_DATAFUSION_MEMORY_POOL: '' ZO_MEMORY_CACHE_ENABLED: 'true' ZO_MEMORY_CACHE_GC_INTERVAL: '0' ZO_MEMORY_CACHE_GC_SIZE: '50' ZO_MEMORY_CACHE_MAX_SIZE: '0' ZO_MEMORY_CACHE_RELEASE_SIZE: '0' ZO_MEMORY_CACHE_SKIP_SIZE: '0' ZO_MEMORY_CACHE_STRATEGY: lru ZO_MEM_PERSIST_INTERVAL: '5' ZO_MEM_TABLE_MAX_SIZE: '0' ZO_META_CONNECTION_POOL_MAX_SIZE: '0' ZO_META_CONNECTION_POOL_MIN_SIZE: '0' ZO_META_STORE: mysql ZO_META_TRANSACTION_LOCK_TIMEOUT: '600' ZO_META_TRANSACTION_RETRIES: '3' ZO_METRICS_DEDUP_ENABLED: 'true' ZO_METRICS_FILE_RETENTION: daily ZO_METRICS_LEADER_ELECTION_INTERVAL: '30' ZO_METRICS_LEADER_PUSH_INTERVAL: '15' ZO_NATS_ADDR: my_custom_host.com:4222 ZO_NATS_COMMAND_TIMEOUT: '10' ZO_NATS_CONNECT_TIMEOUT: '5' ZO_NATS_LOCK_WAIT_TIMEOUT: '600' ZO_NATS_PASSWORD: '' ZO_NATSPREFIX: o2 ZO_NATS_QUEUE_MAX_AGE: '60' ZO_NATS_REPLICAS: '3' ZO_NATS_USER: '' ZO_NODE_ROLE: all ZO_PARQUET_COMPRESSION: zstd ZO_PARQUET_MAX_ROW_GROUP_SIZE: '0' ZO_PAYLOAD_LIMIT: '209715200' ZO_PRINT_KEY_CONFIG: 'false' ZO_PRINT_KEY_EVENT: 'false' ZO_PRINT_KEY_SQL: 'false' ZO_PROF_PYROSCOPE_ENABLED: 'false' ZO_PROF_PYROSCOPE_PROJECT_NAME: openobserve ZO_PROF_PYROSCOPE_SERVER_URL: http://localhost:4040 ZO_PROMETHEUS_HA_CLUSTER: cluster ZO_PROMETHEUS_HA_REPLICA: replica ZO_QUERY_ON_STREAM_SELECTION: 'true' ZO_QUERY_OPTIMIZATION_NUM_FIELDS: '0' ZO_QUERY_THREAD_NUM: '0' ZO_QUERY_TIMEOUT: '600' ZO_QUEUE_STORE: '' ZO_QUICK_MODE_FILE_LIST_ENABLED: 'false' ZO_QUICK_MODE_FILE_LIST_INTERVAL: '300' ZO_QUICK_MODE_NUM_FIELDS: '500' ZO_QUICK_MODE_STRATEGY: '' ZO_ROUTE_MAX_CONNECTIONS: '1024' ZO_ROUTE_TIMEOUT: '600' ZO_RUM_API_VERSION: v1 ZO_RUM_APPLICATION_ID: '' ZO_RUM_CLIENT_TOKEN: '' ZO_RUM_ENABLED: 'false' ZO_RUM_ENV: '' ZO_RUM_INSECURE_HTTP: 'false' ZO_RUM_ORGANIZATION_IDENTIFIER: default ZO_RUM_SERVICE: '' ZO_RUM_SITE: '' ZO_RUM_VERSION: 0.9.1 ZO_S3_ALLOW_INVALID_CERTIFICATES: 'false' ZO_S3_BUCKET_NAME: openobserve-online ZO_S3_BUCKET_PREFIX: '' ZO_S3_CONNECT_TIMEOUT: '10' ZO_S3_FEATURE_FORCE_HOSTED_STYLE: 'false' ZO_S3_FEATURE_FORCE_PATH_STYLE: 'false' ZO_S3_FEATURE_HTTP1_ONLY: 'false' ZO_S3_FEATURE_HTTP2_ONLY: 'false' ZO_S3_PROVIDER: s3 ZO_S3_REGION_NAME: '' ZO_S3_REQUEST_TIMEOUT: '3600' ZO_S3_SERVER_URL: http://xxx.svc.cluster.local:9000 ZO_S3_SYNC_TO_CACHE_INTERVAL: '600' ZO_SKIP_SCHEMA_VALIDATION: 'false' ZO_TCP_PORT: '5514' ZO_TELEMETRY: 'false' ZO_TELEMETRY_HEARTBEAT: '1800' ZO_TELEMETRY_URL: https://e1.zinclabs.dev ZO_TRACES_FILE_RETENTION: hourly ZO_TRACING_ENABLED: 'false' ZO_TRACING_HEADER_KEY: Authorization ZO_TRACING_HEADER_VALUE: '' ZO_TRACING_SEARCH_ENABLED: 'false' ZO_UDP_PORT: '5514' ZO_UI_ENABLED: 'true' ZO_UI_SQL_BASE64_ENABLED: 'false' ZO_USAGE_BATCH_SIZE: '2000' ZO_USAGE_ORG: _meta ZO_USAGE_REPORTING_CREDS: '' ZO_USAGE_REPORTING_ENABLED: 'false' ZO_USAGE_REPORTING_MODE: local ZO_USAGE_REPORTING_URL: http://localhost:5080/api/_meta/usage/_json ZO_WAL_LINE_MODE_ENABLED: 'true' ZO_WAL_MEMORY_MODE_ENABLED: 'false' ZO_WEB_URL: '' ZO_WIDENING_SCHEMA_EVOLUTION: 'true'
thank you for these information, we will find out that.
@hexiang1219 We are trying to find out what could be the possible reasons. Meanwhile, there is one way you can try debugging -
scheduled_jobs
. Whenever an alert is created, its corresponding record is created in the scheduled_jobs
table. For example, if there is an alert named test
for default
log stream in an organization named default
, a record must be there in the scheduled_jobs
table, with org
= default
, module
= 1
, module_key
= logs/default/test
(if no such records present, then the alert can not be scheduled and possibly there is a bug).scheduled_jobs
table, you can check the value of next_run_at
. It is UNIX timestamp in microseconds. You can use this to check when it is supposed to run again. If the next_run_at time has already passed and the alert is not triggered yet then there is definitely a bug.@Subhra264 Yes, i am thinking the same, maybe the trigger removed from scheduled_jobs
.
@hexiang1219 Can you connect me in our Wechat Group?
@Subhra264 Yes, i am thinking the same, maybe the trigger removed from
scheduled_jobs
. @hexiang1219 Can you connect me in our Wechat Group?
Already added, who should I contact?
@hexiang1219 Just talk in the group.
@hexiang1219 did you join our Wechat Group? We can discuss in there.
@hexiang1219 We released v0.13.0-rc1
you can try, it should improved.
Which OpenObserve functionalities are the source of the bug?
alerts
Is this a regression?
No
Description
the following alarm rules are configured
by checking the log of the alarm component, it was found that the alarm conditions have been met
but it was not received at destination. the configuration of destination is as follows. It is normal to send messages directly to the destination address.
Please provide a link to a minimal reproduction of the bug
No response
Please provide the exception or error you saw
No response
Please provide the version you discovered this bug in (check about page for version information)
Anything else?
No response