vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.52k stars 1.54k forks source link

Vector locks up in some pipelines configurations. #11633

Closed blt closed 2 years ago

blt commented 2 years ago

A note for the community

Problem

As a part of my work on #10144 I've recently started to notice that vector -- see config below -- reliably locks up under load. When I say "lock" I mean that the load generation from lading halts as Vector signals back-off, blackhole sink no longer receives from upstream in the topology. Something important somewhere in Vector fills up and the program halts work.

Using the configuration files below you may reproduce this like so:

> VECTOR_THREADS=8 firejail --noprofile --cpu=2,3,4,5,6,7,8,9  ./target/release/vector --config ~/http_pipeline/http_pipelines_no_grok_blackhole.toml 

From the lading root:

> firejail --noprofile --cpu=1 ./target/release/http_gen --config-path ~/http_pipeline/http_gen.yaml

Vector's blackhole will print and then taper off.

Configuration

Relevant vector config:

data_dir = "/var/lib/vector"

##
## Sources
##

[api]
enabled = true

[sources.internal_metrics]
type = "internal_metrics"

[sources.logs]
type = "http"
address = "0.0.0.0:8282"
encoding = "text"

##
## Transforms
##

[transforms.preprocessing]
type = "remap"
inputs = ["logs"]
source = '''
., err = parse_json(.message)
.custom = {}
'''

[transforms.processing]
type = "pipelines"
inputs = ["preprocessing"]

[transforms.processing.logs]
order = [
    "nginx",
    "redis",
    "consul",
    "python",
    "rabbitmq",
    "zookeeper",
    "elasticsearch",
    "kafka",
    "couchdb",
    "docker",
    "datadog_agent",
    "ruby",
    "vault",
    "nginx_ingress_controller",
    "mysql",
    "kubernetes_cluster_autoscaler",
    "aws_alb_ingress_controller",
    "proxysql",
    "azure",
    "azure_web",
    "azure_storage",
    "azure_network",
    "azure_compute",
    "etcd",
    "glog_pipeline",
    "auth0",
    "kube_scheduler__glog_",
    "aws_ecs_agent",
    "nodejs",
    "postgresql",
    "cassandra",
    "apache_httpd",
    "azure_recovery_services",
    "c_",
    "web_browser_logs",
]

[transforms.processing.logs.pipelines.nginx]
name = "nginx"
filter.type = "datadog_search"
filter.source = "source:nginx"

[[transforms.processing.logs.pipelines.nginx.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.redis]
name = "redis"
filter.type = "datadog_search"
filter.source = "source:redis"

[[transforms.processing.logs.pipelines.redis.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.consul]
name = "consul"
filter.type = "datadog_search"
filter.source = "source:consul"

[[transforms.processing.logs.pipelines.consul.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.python]
name = "python"
filter.type = "datadog_search"
filter.source = "source:python"

[[transforms.processing.logs.pipelines.python.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.rabbitmq]
name = "rabbitmq"
filter.type = "datadog_search"
filter.source = "source:rabbitmq"

[[transforms.processing.logs.pipelines.rabbitmq.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.zookeeper]
name = "zookeeper"
filter.type = "datadog_search"
filter.source = "source:zookeeper"

[[transforms.processing.logs.pipelines.zookeeper.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.elasticsearch]
name = "elasticsearch"
filter.type = "datadog_search"
filter.source = "source:elasticsearch"

[[transforms.processing.logs.pipelines.elasticsearch.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.kafka]
name = "kafka"
filter.type = "datadog_search"
filter.source = "source:kafka"

[[transforms.processing.logs.pipelines.kafka.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.couchdb]
name = "couchdb"
filter.type = "datadog_search"
filter.source = "source:couchdb"

[[transforms.processing.logs.pipelines.couchdb.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.docker]
name = "docker"
filter.type = "datadog_search"
filter.source = "source:docker"

[[transforms.processing.logs.pipelines.docker.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.datadog_agent]
name = "datadog_agent"
filter.type = "datadog_search"
filter.source = "source:(agent OR datadog-agent OR datadog-agent-cluster-worker OR datadog-cluster-agent OR process-agent OR security-agent OR system-probe OR trace-agent)"

[[transforms.processing.logs.pipelines.datadog_agent.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.ruby]
name = "ruby"
filter.type = "datadog_search"
filter.source = "source:ruby"

[[transforms.processing.logs.pipelines.ruby.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.vault]
name = "vault"
filter.type = "datadog_search"
filter.source = "source:vault"

[[transforms.processing.logs.pipelines.vault.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.nginx_ingress_controller]
name = "nginx_ingress_controller"
filter.type = "datadog_search"
filter.source = "source:nginx-ingress-controller"

[[transforms.processing.logs.pipelines.nginx_ingress_controller.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.mysql]
name = "mysql"
filter.type = "datadog_search"
filter.source = "source:mysql"

[[transforms.processing.logs.pipelines.mysql.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.kubernetes_cluster_autoscaler]
name = "kubernetes_cluster_autoscaler"
filter.type = "datadog_search"
filter.source = "source:cluster-autoscaler"

[[transforms.processing.logs.pipelines.kubernetes_cluster_autoscaler.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.aws_alb_ingress_controller]
name = "aws_alb_ingress_controller"
filter.type = "datadog_search"
filter.source = "source:aws-alb-ingress-controller"

[[transforms.processing.logs.pipelines.aws_alb_ingress_controller.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.proxysql]
name = "proxysql"
filter.type = "datadog_search"
filter.source = "source:proxysql"

[[transforms.processing.logs.pipelines.proxysql.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.azure]
name = "azure"
filter.type = "datadog_search"
filter.source = "source:(azure OR azure.alertsmanagement OR azure.analysisservices OR azure.apiconfiguration OR azure.apimanagement OR azure.authorization OR azure.automation OR azure.batchai OR azure.batchazure.cache OR azure.blockchain OR azure.cache OR azure.cdn OR azure.classiccompute OR azure.classicstorage OR azure.cognitiveservices OR azure.containerinstance OR azure.containerregistry OR azure.containerservice OR azure.datafactory OR azure.datalakestore OR azure.dbformariadb OR azure.dbformysql OR azure.dbforpostgresql OR azure.devices OR azure.documentdb OR azure.enterpriseknowledgegraph OR azure.eventgrid OR azure.eventhub OR azure.hdinsight OR azure.insights OR azure.iotcentral OR azure.keyvault OR azure.kusto OR azure.logic OR azure.machinelearningservices OR azure.managedidentity OR azure.operationalinsights OR azure.operationsmanagement OR azure.peering OR azure.relay OR azure.resourcegroup OR azure.resources OR azure.search OR azure.security OR azure.servicebus OR azure.servicefabric OR azure.streamanalytics OR azure.subscription OR azure.synapse)"

[[transforms.processing.logs.pipelines.azure.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.azure_web]
name = "azure_web"
filter.type = "datadog_search"
filter.source = "source:azure.web"

[[transforms.processing.logs.pipelines.azure_web.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.azure_storage]
name = "azure_storage"
filter.type = "datadog_search"
filter.source = "source:azure.storage"

[[transforms.processing.logs.pipelines.azure_storage.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.azure_network]
name = "azure_network"
filter.type = "datadog_search"
filter.source = "source:azure.network"

[[transforms.processing.logs.pipelines.azure_network.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.azure_compute]
name = "azure_compute"
filter.type = "datadog_search"
filter.source = "source:azure.compute"

[[transforms.processing.logs.pipelines.azure_compute.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.etcd]
name = "etcd"
filter.type = "datadog_search"
filter.source = "source:etcd"

[[transforms.processing.logs.pipelines.etcd.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.glog_pipeline]
name = "glog_pipeline"
filter.type = "datadog_search"
filter.source = "source:(admission-webhook OR api-server OR cert-manager-acmesolver OR cert-manager-cainjector OR cert-manager-controller OR cert-manager-webhook OR cluster-proportional-autoscaler-amd64 OR hyperkube OR ip-masq-agent OR k8s-prometheus-adapter-amd64 OR kube-apiserver OR kube-controller-manager OR kube-proxy OR kube-state-metrics OR metacontroller OR metrics-server-amd64 OR prometheus-operator OR vpa-admission-controller OR vpa-recommender OR vpa-updater)"

[[transforms.processing.logs.pipelines.glog_pipeline.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.auth0]
name = "auth0"
filter.type = "datadog_search"
filter.source = "source:auth0"

[[transforms.processing.logs.pipelines.auth0.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.kube_scheduler__glog_]
name = "kube_scheduler__glog_"
filter.type = "datadog_search"
filter.source = "source:(kube_scheduler OR kube-scheduler)"

[[transforms.processing.logs.pipelines.kube_scheduler__glog_.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.aws_ecs_agent]
name = "aws_ecs_agent"
filter.type = "datadog_search"
filter.source = "source:amazon-ecs-agent"

[[transforms.processing.logs.pipelines.aws_ecs_agent.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.nodejs]
name = "nodejs"
filter.type = "datadog_search"
filter.source = "source:nodejs"

[[transforms.processing.logs.pipelines.nodejs.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.postgresql]
name = "postgresql"
filter.type = "datadog_search"
filter.source = "source:postgresql"

[[transforms.processing.logs.pipelines.postgresql.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.cassandra]
name = "cassandra"
filter.type = "datadog_search"
filter.source = "source:cassandra"

[[transforms.processing.logs.pipelines.cassandra.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.apache_httpd]
name = "apache_httpd"
filter.type = "datadog_search"
filter.source = "source:httpd"

[[transforms.processing.logs.pipelines.apache_httpd.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.azure_recovery_services]
name = "azure_recovery_services"
filter.type = "datadog_search"
filter.source = "source:azure.recoveryservices"

[[transforms.processing.logs.pipelines.azure_recovery_services.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.c_]
name = "c_"
filter.type = "datadog_search"
filter.source = "source:csharp"

[[transforms.processing.logs.pipelines.c_.transforms]]
type = "remap"
source = "true"

[transforms.processing.logs.pipelines.web_browser_logs]
name = "web_browser_logs"
filter.type = "datadog_search"
filter.source = "source:browser"

[[transforms.processing.logs.pipelines.web_browser_logs.transforms]]
type = "remap"
source = "true"

##
## Sinks
##

[sinks.prometheus]
type = "prometheus_exporter"
inputs = ["internal_metrics"]
address = "0.0.0.0:9598"

[sinks.blackhole]
type = "blackhole"
inputs = ["processing"]

Relevant lading config:

```yaml
worker_threads: 1
prometheus_addr: "0.0.0.0:9001"

targets:
  vector:
    seed: [2, 3, 5, 7, 11, 13, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137]
    headers: {}
    target_uri: "http://localhost:8282/"
    bytes_per_second: "500 Mb"
    parallel_connections: 10
    method:
      post:
        maximum_prebuild_cache_size_bytes: "256 Mb"
        variant:
          static:
            static_path: "/home/blt/http_pipeline/bootstrap.log"

The referenced bootstrap log file is:

{ "source" : "nginx", "message": "127.0.0.1 - frank [13\/Jul\/2016:10:55:36 +0000] \"GET \/apache_pb.gif HTTP\/1.0\" 200 2326" }
{ "source" : "nginx", "message": "172.17.0.1 - - [06\/Jan\/2017:16:16:37 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 404 612 \"http:\/\/www.perdu.com\/\" \"Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/55.0.2883.87 Safari\/537.36\" \"-\"" }
{ "source" : "nginx", "message": "172.17.0.1 - - [06\/Jan\/2017:16:16:37 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 612 \"http:\/\/www.perdu.com\/\" \"Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/55.0.2883.87 Safari\/537.36\" \"-\" those are random characters" }
{ "source" : "nginx", "message": "2017\/09\/26 14:36:50 [error] 8409#8409: *317058 \"\/usr\/share\/nginx\/html\/sql\/sql-admin\/index.html\" is not found (2: No such file or directory), client: 217.92.148.44, server: localhost, request: \"HEAD http:\/\/174.138.82.103:80\/sql\/sql-admin\/ HTTP\/1.1\", host: \"174.138.82.103\"" }
{ "source" : "nginx", "message": "2017\/09\/26 14:36:50 [info] 14#14: *285 client 172.17.0.27 closed keepalive connection" }
{ "source" : "nginx", "message": "127.0.0.1 - - [19\/Feb\/2015:15:50:36 -0500] \"GET \/big.pdf HTTP\/1.1\" 206 33973115 0.202 \"-\" \"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/40.0.2214.111 Safari\/537.36\"" }
{ "source" : "redis", "message": "12115:M 08 Jan 17:45:41.572 # WARNING: The TCP backlog setting of 511 cannot be enforced because \/proc\/sys\/net\/core\/somaxconn is set to the lower value of 128." }
{ "source" : "redis", "message": "12115:M 08 Jan 17:45:41.572 # Server started, Redis version 3.0.6" }
{ "source" : "redis", "message": "3575:M 16 Apr 15:37:05.124 - DB 0: 1 keys (0 volatile) in 4 slots HT." }
{ "source" : "redis", "message": "1:M 14 May 2019 19:11:40.164 * Background saving terminated with success" }
{ "source" : "consul", "message": "==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1" }
{ "source" : "consul", "message": "\t2015\/11\/24 11:41:05 [INFO] serf: EventMemberJoin: 10.0.255.5 10.0.255.5" }
{ "source" : "consul", "message": "Mar 23 14:39:46 precise64 consul-template[23544]: (runner) creating Watcher" }
{ "source" : "consul", "message": "2018-10-19T19:04:16.180Z [TRACE] pkcs11: key successfully found" }
{ "source" : "python", "message": "{\"threadName\": \"MainThread\",\"name\": \"root\",\"thread\": 140735202359648,\"created\": 1336281068.506248,\"process\": 41937,\"processName\": \"MainProcess\",\"relativeCreated\": 9.100914001464844,\"module\": \"tests\",\"funcName\": \"testFormatKeys\",\"levelno\": 20,\"msecs\": 506.24799728393555,\"pathname\": \"tests\/tests.py\",\"lineno\": 60,\"asctime\": [\"12-05-05 22:11:08,506248\"],\"message\": \"testing logging format\",\"filename\": \"tests.py\",\"levelname\": \"INFO\",\"special\": \"value\",\t\"run\": 12}" }
{ "source" : "python", "message": "2017-12-26T10:44:59,311 ERROR [dd.dogweb.lib.static_asset] [20081] rid=b0RV4J Error reading statics_version file.Traceback (most recent call last): File \"\/home\/vagrant\/dogweb\/dogweb\/lib\/static_asset.py\", line 16, in version    with open(\"\/etc\/datadog\/statics_version.txt\", \"r\") as statics_version_file: IOError: [Errno 2] No such file or directory: \/etc\/datadog\/statics_version.txt" }
{ "source" : "python", "message": "2017-12-26T10:44:59,311 ERROR Error reading statics_version file." }
{ "source" : "python", "message": "2019-01-07 15:20:15,972 DEBUG [flask.app] [app.py:100] [dd.trace_id=5688176451479556031 dd.span_id=4663104081780224235] - Hook: teardown_appcontext" }
{ "source" : "python", "message": "2019-01-07 15:20:15,972 DEBUG [flask.app] [app.py:100] [dd.trace_id=5688176451479556031 dd.span_id=4663104081780224235] - rid=b0RV4J Error reading statics_version file.Traceback (most recent call last): File \"\/home\/vagrant\/dogweb\/dogweb\/lib\/static_asset.py\", line 16, in version    with open(\"\/etc\/datadog\/statics_version.txt\", \"r\") as statics_version_file: IOError: [Errno 2] No such file or directory: \/etc\/datadog\/statics_version.txt" }
{ "source" : "python", "message": "2020-03-20 14:15:07,124 INFO [root] [app.py:78] [dd.service=excelsior dd.env=jesse-dev dd.version=abc123 dd.trace_id=9659687005038611690 dd.span_id=6632586210846541907] - This is a test of info logs" }
{ "source" : "rabbitmq", "message": "=INFO REPORT==== 8-Mar-2018::14:09:27 === closing AMQP connection <0.22578.1180> (62.210.247.209:43887 -> 62.210.129.60:5672" }
{ "source" : "rabbitmq", "message": "2018-11-22 10:44:33.654 [info] <0.620.0> accepting AMQP connection <0.620.0> (127.0.0.1:52771 -> 127.0.0.1:5672)" }
{ "source" : "zookeeper", "message": "2000-09-07 14:07:41,508 [main] INFO  MyApp - Entering application." }
{ "source" : "zookeeper", "message": "54 [main] INFO MyApp.foo.bar - Entering application." }
{ "source" : "zookeeper", "message": "2000-09-07 14:07:44 INFO org.foo.bar:32 - Entering application." }
{ "source" : "elasticsearch", "message": "[2018-03-15T10:49:37,857][INFO ][index.search.slowlog.query] [i-0a78cf272c227585e] [8.1520909160000.4][0] took[6.9s], took_millis[6962], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[2], source[{\"size\":0,\"timeout\":\"1m\",\"query\":\"myquery}]" }
{ "source" : "elasticsearch", "message": "[2018-03-15T11:18:32,097][TRACE][index.indexing.slowlog.index] [i-0fb0942bc5af92a50] [62.1520954640000.0\/mzEo0np9TE67-TZfLSizUQ] took[211.4ms], took_millis[211], type[event], id[AWIpYf-elx3mYSdfhef5], routing[] , source[{\"tag\":{\"role\":\"alerting-metric-evaluator\",\"name\":\"alerting-metric-evaluator\",\"source\":\"\",\"env\":\"staging\",\"availability-zone\":\"us-east-1d\"},\"timestamp\":\"2018-03-15T11:18:31.066Z\",\"tiebreaker\":-724304927,\"service\":\"alerting-metric-query\",\"status\":\"info\",\"host\":\"i-0a88da725e2bf2373\",\"tags\":[\"worker_id:2\",\"service:alerting-metric-query\",\"source:\",\"lsb_codename:trusty\",\"lsb_id:ubuntu\",\"lsb_release:14.04\",\"account:staging\",\"autoscaling.managed:true\",\"autoscaling_group:alerting-metric-evaluator-anomaly\",\"availability-zone:us-east-1d\",\"aws_account:727006795293\",\"ec2_churn:666\",\"iam_profile:dd-common-profile\",\"image:ami-09192173\",\"instance-type:r4.2xlarge\",\"kernel:none\",\"name:alerting-metric-evaluator\",\"region:us-east-1\",\"security-group-name:alerting-metric-evaluator\",\"security-group-name:common\",\"security_group_name:alerting-metric-evaluator\",\"security_group_name:common\",\"team:bourbon\",\"team:yuzu\",\"terraform.managed:true\",\"terraform.module:alerting-metric-evaluator\",\"terraform.module:vpc-comm]" }
{ "source" : "elasticsearch", "message": "[2018-03-15T10:49:37,857][INFO ][foo.bar] new node installed" }
{ "source" : "elasticsearch", "message": "[2018-06-05 10:06:31,687][INFO ][cluster.metadata         ] [elasticsnoop_node_two] [test-index] creating index, cause [api], templates [], shards [5]\/[1], mappings []" }
{ "source" : "kafka", "message": "2000-09-07 14:07:41,508 [main] INFO  MyApp - Entering application." }
{ "source" : "kafka", "message": "54 [main] INFO MyApp.foo.bar - Entering application." }
{ "source" : "kafka", "message": "2000-09-07 14:07:44 INFO org.foo.bar:32 - Entering application." }
{ "source" : "kafka", "message": "[2020-03-12 16:23:32,543] INFO [ProducerStateManager partition=my_partition] Writing producer snapshot at offset 172041391 (kafka.log.ProducerStateManager)" }
{ "source" : "couchdb", "message": "[Wed, 29 Aug 2018 11:53:11 GMT] [info] [<0.404.0>] 127.0.0.1 - - DELETE \/new_database\/random_task?rev=2-4cc3dfb6e76befd665faf124b36b7f1c 200" }
{ "source" : "couchdb", "message": "2015-12-18 14:44:52.722 [info] Undefined <0.7.0> Application runtime_tools started on node 'node3@127.0.0.1'" }
{ "source" : "couchdb", "message": "[error] 2018-08-29T00:00:14.470000Z couchdb@10.24.7.110 <0.8925.6176> -------- CRASH REPORT Process  (<0.8925.6176>) with 0 neighbors exited with reason: {normal,{gen_server,call,[<0.16537.6180>,close,infinity]}} at gen_server:terminate\/7(line:826) <= proc_lib:init_p_do_apply\/3(line:240); initial_call: {couch_index_compactor,init,['Argument__1']}, ancestors: [<0.19789.6149>,<0.6121.6149>], messages: [], links: [], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 27, reductions: 1888'" }
{ "source" : "docker", "message": "172.17.0.1 - - [06\/Jan\/2017:16:16:37 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 404 612 \"http:\/\/www.perdu.com\/\" \"Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/55.0.2883.87 Safari\/537.36\" \"-\"" }
{ "source" : "docker", "message": "2018-07-05 10:08:08 GMT | INFO | (runner.go:309 in work) | Done running check network" }
{ "source" : "docker", "message": "W0601 14:20:25.000572       1 manager.go:147] Failed to get all responses in time (got 3\/4)" }
{ "source" : "docker", "message": "[ AGENT ] 2018-07-11 18:27:57 UTC | INFO | (transaction.go:121 in Process) | Successfully posted payload to \"https:\/\/6-3-0-app.agent.datadoghq.com\/api\/v1\/series?api_key=*************************64a86\"" }
{ "source" : "agent", "message": "2019-04-08 13:53:48 UTC | CORE | INFO | (pkg\/collector\/python\/datadog_agent.go:120 in LogMessage) | kafka_cluster_status:8ca7b736f0aa43e5 | (kafka_cluster_status.py:213) | Checking for out of sync partition replicas" }
{ "source" : "agent", "message": "2019-04-08 13:53:48 UTC | CORE | INFO | (pkg\/collector\/python\/datadog_agent.go:121 in LogMessage) | - | (client.py:479) | Zookeeper connection established, state: CONNECTED" }
{ "source" : "agent", "message": "2019-04-08 13:53:48 UTC | CORE | INFO | (pkg\/collector\/runner\/runner.go:327 in work) | check:network,type:core | Done running check" }
{ "source" : "agent", "message": "2019-04-08 13:53:48 UTC | CORE | INFO | (pkg\/collector\/runner\/runner.go:264 in work) | Running check load" }
{ "source" : "agent", "message": "2019-04-08 13:53:48 UTC | TRACE | INFO | (pkg\/trace\/agent\/agent.go:145 in loop) | exiting" }
{ "source" : "agent", "message": "2019-02-01 16:59:41 UTC | INFO | (connection_manager.go:124 in CloseConnection) | Connection closed" }
{ "source" : "agent", "message": "2019-02-01 16:58:54 UTC | INFO | (serializer.go:263 in SendJSONToV1Intake) | Sent processes metadata payload, size: 1422 bytes." }
{ "source" : "agent", "message": "2020-01-03 15:37:00 UTC | SYS-PROBE | WARN | (pkg\/ebpf\/netlink\/conntracker.go:301 in unregister) | exceeded maximum tracked short lived connections" }
{ "source" : "agent", "message": "2020-11-18 10:31:13 UTC | JMX | INFO  | Instance | Done initializing JMX Server at localhost:7199" }
{ "source" : "agent", "message": "2020-11-17 17:00:01 UTC | JMX | DEBUG | HttpClient | attempting to connect to: https:\/\/localhost:5001\/agent\/jmx\/status" }
{ "source" : "ruby", "message": "I, [1999-03-03T02:34:24.895701 #19074]  INFO -- Main: Some error." }
{ "source" : "ruby", "message": "method=GET path=\/jobs\/833552.json format=json controller=jobs action=show status=200 duration=58.33 view=40.43 db=15.26" }
{ "source" : "ruby", "message": "I, [1999-03-03T02:34:24.895701 #19074]  INFO -- : Completed 200 OK in 41ms (Views: 0.2ms | ActiveRecord: 6.2ms)" }
{ "source" : "ruby", "message": "I, [1999-03-03T02:34:24.895701 #19074]  INFO -- : Started GET \"\/api\/v1\/example\/query\" for 123.123.123.123 at 1999-03-03 02:34:24+0000" }
{ "source" : "ruby", "message": "I, [1999-03-03T02:34:24.895701 #19074]  INFO -- : Processing by MyCheck::CheckController#index as JSON" }
{ "source" : "ruby", "message": "I, [2018-06-27T08:01:52.991728 #1]  INFO -- : Received: {\"action\":\"cancel\",\"order\":{\"id\":1234567,\"timestamp\":153012312}}" }
{ "source" : "ruby", "message": "[2019-01-16 15:38:45 +0000][my_app][WARN][dd.trace_id=7396234561907730428 dd.span_id=1152346429682372911] This is a test message!" }
{ "source" : "ruby", "message": "[dd.trace_id=1321281354128614668 dd.span_id=5986620922286636160] Started GET \"\/db\" for 172.22.0.1 at 2019-01-16 18:04:47 +0000" }
{ "source" : "ruby", "message": "[dd.trace_id=1321281354128614668 dd.span_id=5986620922286636160] Processing by DatabaseController#index as *\/*" }
{ "source" : "ruby", "message": "[dd.trace_id=1321281354128614668 dd.span_id=5986620922286636160] Completed 200 OK in 9ms (Views: 6.3ms | ActiveRecord: 1.0ms)" }
{ "source" : "ruby", "message": "I, [2019-09-24T17:24:34.414676 #1]  INFO -- : [dd.trace_id=1748205551911478249 dd.span_id=387648978129867948] Started GET \"\/notes\/1\" for 192.168.80.1 at 2019-09-24 17:24:34 +0000" }
{ "source" : "ruby", "message": "I, [2019-09-24T17:24:34.414676 #1]  INFO -- : [dd.env=dev-env dd.service=sample-app dd.version=1.2.3-beta dd.trace_id=1748205551911478249 dd.span_id=387648978129867948] Started GET \"\/notes\/1\" for 192.168.80.1 at 2019-09-24 17:24:34 +0000" }
{ "source" : "ruby", "message": "[dd.env=dev-env dd.service=sample-app dd.version=1.2.3-beta dd.trace_id=1321281354128614668 dd.span_id=5986620922286636160] Completed 200 OK in 9ms (Views: 6.3ms | ActiveRecord: 1.0ms)" }
{ "source" : "ruby", "message": "[dd.env=dev-env dd.service=sample-app dd.version=1.2.3-beta dd.trace_id=1321281354128614668 dd.span_id=5986620922286636160] Processing by DatabaseController#index as *\/*" }
{ "source" : "vault", "message": "2019-12-18T20:26:12.000Z [INFO]  core: security barrier not initialized" }
{ "source" : "vault", "message": "2019-12-06T23:59:40.398Z [DEBUG] expiration: collecting leases" }
{ "source" : "vault", "message": "2019-12-06T23:59:40.425Z [WARN]  no `api_addr` value specified in config or in VAULT_API_ADDR; falling back to detection if possible, but this value should be manually set" }
{ "source" : "vault", "message": "{ \"time\": \"2019-11-05T00:40:27.638711Z\", \"type\": \"request\", \"auth\": { \"client_token\": \"hmac-sha256:6291b17ab99eb5bf3fd44a41d3a0bf0213976f26c72d12676b33408459a89885\", \"accessor\": \"hmac-sha256:2630a7b8e996b0c451db4924f32cec8793d0eb69609f777d89a5c8188a742f52\", \"display_name\": \"root\", \"policies\": [ \"root\" ], \"token_policies\": [ \"root\" ], \"token_type\": \"service\" }, \"request\": { \"id\": \"9adb5544-637f-3d42-9459-3684f5d21996\", \"operation\": \"update\", \"client_token\": \"hmac-sha256:6291b17ab99eb5bf3fd44a41d3a0bf0213976f26c72d12676b33408459a89885\", \"client_token_accessor\": \"hmac-sha256:2630a7b8e996b0c451db4924f32cec8793d0eb69609f777d89a5c8188a742f52\", \"namespace\": { \"id\": \"root\" }, \"path\": \"sys\/policies\/acl\/admin\", \"data\": { \"policy\": \"hmac-sha256:212744709e5a643a5ff4125160c26983f8dab537f60d166c2fac5b95547abc33\" }, \"remote_address\": \"127.0.0.1\" } }" }
{ "source" : "nginx-ingress-controller", "message": "172.17.0.1 - [172.17.0.1] - - [07\/Dec\/2018:18:31:33 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 261 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 1904 0.011 [proxyname-8080] 172.17.0.1:9000 298 0.012 200 abcdefg12345abcdef" }
{ "source" : "nginx-ingress-controller", "message": "172.17.0.1 - - [07\/Dec\/2018:18:31:33 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 261 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 1904 0.011 [proxyname-8080] 172.17.0.1:9000 298 0.012 200 abcdefg12345abcdef" }
{ "source" : "nginx-ingress-controller", "message": "2017\/09\/26 14:36:50 [error] 8409#8409: *317058 \"\/usr\/share\/nginx\/html\/sql\/sql-admin\/index.html\" is not found (2: No such file or directory), client: 217.92.148.44, server: localhost, request: \"HEAD http:\/\/174.138.82.103:80\/sql\/sql-admin\/ HTTP\/1.1\", host: \"174.138.82.103\"" }
{ "source" : "nginx-ingress-controller", "message": "2017\/09\/26 14:36:50 [info] 14#14: *285 client 172.17.0.27 closed keepalive connection" }
{ "source" : "nginx-ingress-controller", "message": "I1221 13:00:58.488494       7 controller.go:202] Initial sync, sleeping for 1 second." }
{ "source" : "nginx-ingress-controller", "message": "172.17.0.1 - [172.17.0.1] - - [07\/Dec\/2018:18:31:33 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 261 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 1904 0.011 [proxyname-8080] 172.17.0.1:9000 298 0.012 200" }
{ "source" : "nginx-ingress-controller", "message": "172.17.0.1 - [172.17.0.1] - - [07\/Dec\/2018:18:31:33 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 261 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 1904 0.011 [proxyname-8080] 172.17.0.1:9000, 100.110.0.3:80 298, 0 0.012, - 200, - abcdefg12345abcdef" }
{ "source" : "nginx-ingress-controller", "message": "172.17.0.1 - [172.17.0.1] - - [07\/Dec\/2018:18:31:33 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 261 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 1904 0.011 [proxyname-8080] 172.17.0.1:9000, 100.110.0.3:80 298, 0 0.012, 0.013 200, - abcdefg12345abcdef" }
{ "source" : "nginx-ingress-controller", "message": "172.17.0.1 - [172.17.0.1] - - [07\/Dec\/2018:18:31:33 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 261 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 1904 0.011 [proxyname-8080] 172.17.0.1:9000, 100.110.0.3:80, 100.110.0.3:80 298, -, - 0.012, -, - 200, -, - abcdefg12345abcdef" }
{ "source" : "nginx-ingress-controller", "message": "172.17.0.1 - [172.17.0.1] - - [07\/Dec\/2018:18:31:33 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 200 261 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 1904 0.011 [proxyname-8080] [alt-proxyname-8080] 172.17.0.1:9000 298 0.012 200 abcdefg12345abcdef" }
{ "source" : "nginx-ingress-controller", "message": "172.16.99.64 - - [19\/Mar\/2020:16:02:20 +0000] \"GET \/datadoghq\/company?test=var1%20Pl HTTP\/1.1\" 503 605 \"-\" \"Mozilla\/5.0 (Windows NT 5.1; rv:31.0) Gecko\/20100101 Firefox\/31.0\" 4033 0.000 [proxyname-8080] [] - - - - abcdefg12345abcdef" }
{ "source" : "mysql", "message": "2017-12-29T12:33:33.095243Z         2 Query     SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE CREATE_OPTIONS LIKE partiion;" }
{ "source" : "mysql", "message": "2017-12-29T12:04:09.954078Z 0 [Warning] System table \"plugin\" is expected to be transactional." }
{ "source" : "mysql", "message": "171113 14:14:20  InnoDB: Shutdown completed; log sequence number 1595675" }
{ "source" : "mysql", "message": "201210 20:58:16\t424242 Query\tSELECT option_value FROM galaxy_guide WHERE option_name = 'zaphod' LIMIT 1;\n" }
{ "source" : "mysql", "message": "424242 Query\tSELECT option_value FROM galaxy_guide WHERE option_name = 'trillian' LIMIT 1;\n" }
{ "source" : "mysql", "message": "2021-07-06  7:29:32 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.\n" }
{ "source" : "mysql", "message": "2016-06-15 16:53:33 139651251140544 [Note] InnoDB: The InnoDB memory heap is disabled\n" }
{ "source" : "mysql", "message": "2016-06-15  7:53:33 139651251140544 [Note] InnoDB: The InnoDB memory heap is disabled\n" }
{ "source" : "mysql", "message": "# Query_time: 0.214922  Lock_time: 0.000184 Rows_sent: 248832  Rows_examined: 72\nSET timestamp=1574851393;\nSELECT * FROM fruit f1, fruit f2, fruit f3, fruit f4, fruit f5;\n" }
{ "source" : "mysql", "message": "# Time: 2019-11-27T10:43:13.460744Z\n# User@Host: root[root] @ localhost [1.2.3.4]  Id:    35\n# Query_time: 0.214922  Lock_time: 0.000184 Rows_sent: 248832  Rows_examined: 72\nSET timestamp=1574851393;\nSELECT * FROM fruit f1, fruit f2, fruit f3, fruit f4, fruit f5;\n" }
{ "source" : "mysql", "message": "# Time: 2019-11-27T10:43:13.460744Z\n# User@Host: root[root] @ localhost []  Id:    35\n# Query_time: 0.214922  Lock_time: 0.000184 Rows_sent: 248832  Rows_examined: 72\n# Thread_id: 55   Killed: 0  Errno: 0\n# Bytes_sent: 123456   Bytes_received: 0\nSET timestamp=1574851393;\nSELECT * FROM fruit f1, fruit f2, fruit f3, fruit f4, fruit f5\n" }
{ "source" : "mysql", "message": "# Time: 191127 15:45:44\n# User@Host: root[root] @ localhost [1.2.3.4]\n# Thread_id: 55  Schema: app  QC_hit: No\n# Query_time: 0.214922  Lock_time: 0.000159 Rows_sent: 248832  Rows_examined: 12\n# Rows_affected: 0\nuse app;\nSET timestamp=1574869544;\nSELECT * FROM fruit f1, fruit f2, fruit f3, fruit f4, fruit f5;\n" }
{ "source" : "mysql", "message": "# Time: 191127 15:45:44\n# User@Host: root[root] @ localhost [1.2.3.4]\n# Thread_id: 55  Schema: app  Last_errno: 0  Killed: 0\n# Query_time: 0.214922  Lock_time: 0.000159 Rows_sent: 248832  Rows_examined: 12  Rows_affected: 0\n# Bytes_sent: 123456\n# Stored_routine: fruit.five_product\nuse app;\nSET timestamp=1574869544;\nSELECT * FROM fruit f1, fruit f2, fruit f3, fruit f4, fruit f5;\n" }
{ "source" : "cluster-autoscaler", "message": "I0402 08:06:51.571361       1 static_autoscaler.go:366] Calculating unneeded nodes" }
{ "source" : "cluster-autoscaler", "message": "I0402 08:06:51.571361       1 scale_down.go:414] Node ip-10-61-112-221.eu-west-2.compute.internal is not suitable for removal - utilization too big (0.908163)" }
{ "source" : "cluster-autoscaler", "message": "E0402 08:06:51.571361       1 utils.go:60] pod.Status.StartTime is nil for pod friz-ws-8929cabd74a-stfz4. Should not reach here." }
{ "source" : "aws-alb-ingress-controller", "message": "I0408 21:09:06.840010       1 store.go:226] updating annotations information for ingress ref-prod\/dre-proxy" }
{ "source" : "aws-alb-ingress-controller", "message": "I0408 21:09:06.840010       1 rules.go:98] ref-prod\/ref-prod-ingress: rule 2 modified with conditions [{    Field: \"host-header\",    Values: [\"oauth2-int.us-east-1.ref-prod.wds.io\"]  },{    Field: \"path-pattern\",    Values: [\"\/*\"]  }]" }
{ "source" : "aws-alb-ingress-controller", "message": "W0408 21:09:06.840010       1 reflector.go:270] pkg\/mod\/k8s.io\/client-go@v2.0.0-alpha.0.0.20191112141123-5c4ab214ef58+incompatible\/utils\/cache\/reflector.go:95: watch of *v1.Endpoints ended with: too old resource version: 98851296 (98851900)" }
{ "source" : "proxysql", "message": "2020-04-06 08:59:07 MySQL_Monitor.cpp:960:monitor_read_only_thread(): [ERROR] Timeout on read_only check for db:3306 after 2ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout on creating new connection: Can't connect to MySQL server on 'db' (115)." }
{ "source" : "proxysql", "message": "2020-04-06 08:59:08 [INFO] New mysql_aws_aurora_hostgroups table" }
{ "source" : "proxysql", "message": "Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Tue Oct 15 17:03:55 2019" }
{ "source" : "proxysql", "message": "{\"client_addr\":\"127.0.0.1:39954\",\"event\":\"Admin_Connect_ERR\",\"schemaname\":\"test\",\"ssl\":false,\"thread_id\":2,\"proxy_addr\":\"0.0.0.0:6033\",\"time\":\"2019-05-20 18:48:47.631\",\"timestamp\":1558342127631,\"username\":\"admin\"}" }
{ "source" : "proxysql", "message": "{\"client_addr\":\"127.0.0.1:39954\",\"creation_time\":\"2019-05-20 18:48:47.631\",\"duration\":\"125.02ms\",\"event\":\"Admin_Close\",\"extra_info\":\"MySQL_Thread.cpp:2652:~MySQL_Thread()\",\"schemaname\":\"test\",\"ssl\":false,\"thread_id\":2,\"time\":\"2019-05-20 18:48:47.631\",\"timestamp\":1558342127631,\"username\":\"admin\"}" }
{ "source" : "proxysql", "message": "{\"client\":\"127.0.0.1:39958\",\"digest\":\"0x98A2503010E9E4C8\",\"duration_us\":366,\"endtime\":\"2019-07-14 18:06:03.285029\",\"endtime_timestamp_us\":1563091563285029,\"event\":\"COM_STMT_PREPARE\",\"hostgroup_id\":0,\"query\":\"SELECT id,id2 FROM test1 WHERE id < ?\",\"rows_affected\":0,\"rows_sent\":0,\"schemaname\":\"test\",\"server\":\"127.0.0.1:3306\",\"starttime\":\"2019-07-14 18:06:03.284663\",\"starttime_timestamp_us\":1563091563284663,\"thread_id\":3,\"username\":\"sbtest\"}" }
{ "source" : "azure", "message": "{\"resourceId\": \"\/SUBSCRIPTIONS\/8C56D827-5F07-45CE-8F2B-6C5001DB5C6F\/RESOURCEGROUPS\/KITCHEN-DD-AGENT-WIN2019-AZURE-20200407T115109PL2363814-A6\/PROVIDERS\/MICROSOFT.COMPUTE\/VIRTUALMACHINES\/DDATWIN2019\",\"category\": \"Administrative\",\"callerIpAddress\": \"34.224.243.198\",\"level\": \"Warning\",\"operationName\": \"MICROSOFT.AUTHORIZATION\/POLICIES\/AUDIT\/ACTION\",\"correlationId\": \"aea8188d-9be4-48c7-9e8f-3bf10a545ff3\",\"time\": \"2020-04-07T12:01:17.8810661Z\",\"resultType\": \"Success\",\"durationMs\": 123,\"resultSignature\": \"Succeeded.\", \"identity\": {\"authorization\": {\"evidence\": {\"principalId\": \"5f6793e4783e49a2bfb8cdaasdada\"}}}}" }
{ "source" : "azure.web", "message": "{\"resourceId\": \"\/SUBSCRIPTIONS\/8C56D827-5F07-45CE-8F2B-6C5001DB5C6F\/RESOURCEGROUPS\/KITCHEN-DD-AGENT-WIN2019-AZURE-20200407T115109PL2363814-A6\/PROVIDERS\/MICROSOFT.COMPUTE\/VIRTUALMACHINES\/DDATWIN2019\",\"category\": \"Administrative\",\"callerIpAddress\": \"34.224.243.198\",\"level\": \"Warning\",\"operationName\": \"MICROSOFT.WEB\/SITES\/EXTENSIONS\/WRITE\",\"correlationId\": \"aea8188d-9be4-48c7-9e8f-3bf10a545ff3\",\"time\": \"2020-04-07T12:01:17.8810661Z\",\"resultType\": \"Success\",\"durationMs\": 123,\"resultSignature\": \"Succeeded.\", \"identity\": {\"authorization\": {\"evidence\": {\"principalId\": \"5f6793e4783e49a2bfb8cdaasdada\"}}}}" }
{ "source" : "azure.storage", "message": "{\"resourceId\": \"\/SUBSCRIPTIONS\/8C56D827-5F07-45CE-8F2B-6C5001DB5C6F\/RESOURCEGROUPS\/CONTAINERGROUP\/PROVIDERS\/MICROSOFT.STORAGE\/STORAGEACCOUNTS\/B7UI4VJGWZ6EODIAG0\",\"category\": \"Administrative\",\"callerIpAddress\": \"104.209.130.4\",\"level\": \"Error\",\"operationName\": \"MICROSOFT.STORAGE\/STORAGEACCOUNTS\/LISTKEYS\/ACTION\",\"correlationId\": \"f860428d-66c1-4639-a3da-b963663f892f\",\"time\": \"2020-04-04T02:48:41.1195332Z\",\"resultType\": \"Failure\",\"durationMs\": 28455,\"resultSignature\": \"Failed.InternalServerError\", \"identity\": {\"authorization\": {\"evidence\": {\"principalId\": \"5f6793e4783e49a2bfb8cdaasdada\"}}}}" }
{ "source" : "azure.network", "message": "{\"resourceId\": \"\/SUBSCRIPTIONS\/8C56D827-5F07-45CE-8F2B-6C5001DB5C6F\/RESOURCEGROUPS\/KITCHEN-DD-AGENT-WIN2019-AZURE-20200407T115109PL2363814-A6\/PROVIDERS\/MICROSOFT.COMPUTE\/VIRTUALMACHINES\/DDATWIN2019\",\"category\": \"Administrative\",\"callerIpAddress\": \"34.224.243.198\",\"level\": \"Warning\",\"operationName\": \"MICROSOFT.AUTHORIZATION\/POLICIES\/AUDIT\/ACTION\",\"correlationId\": \"aea8188d-9be4-48c7-9e8f-3bf10a545ff3\",\"time\": \"2020-04-07T12:01:17.8810661Z\",\"resultType\": \"Success\",\"durationMs\": 123,\"resultSignature\": \"Succeeded.\", \"identity\": {\"authorization\": {\"evidence\": {\"principalId\": \"5f6793e4783e49a2bfb8cdaasdada\"}}}}" }
{ "source" : "azure.network", "message": "{\"resourceId\": \"\/SUBSCRIPTIONS\/8C56D827-5F07-45CE-8F2B-6C5001DB5C6F\/RESOURCEGROUPS\/HETANSH-RESOURCE-GROUP-FOR-TESTING\/PROVIDERS\/MICROSOFT.NETWORK\/FRONTDOORS\/HETANSHTEST\",\"category\": \"ResourceHealth\",\"properties\": {\"eventCategory\": \"ResourceHealth\",\"eventProperties\": {\"cause\": \"PlatformInitiated\",\"details\": \"\",\"currentHealthStatus\": \"Unavailable\",\"previousHealthStatus\": \"Available\",\"title\": \"Front Door service has detected global availability drops impacting one or more regions. The issue may cause failures for client requests sent to this Front Door resource.\",\"type\": \"Downtime\"}},\"level\": \"Critical\",\"correlationId\": \"699a9793-d7d7-4613-b58a-0018da23b2cb\",\"operationName\": \"Microsoft.Resourcehealth\/healthevent\/Activated\/action\",\"time\": \"2020-04-06T10:54:44.4707124Z\",\"resultType\": \"Active\", \"identity\": {\"authorization\": {\"evidence\": {\"principalId\": \"5f6793e4783e49a2bfb8cdaasdada\"}}}}" }
{ "source" : "azure.compute", "message": "{\"resourceId\": \"\/SUBSCRIPTIONS\/8C56D827-5F07-45CE-8F2B-6C5001DB5C6F\/RESOURCEGROUPS\/KITCHEN-DD-AGENT-WIN2019-AZURE-20200407T115109PL2363814-A6\/PROVIDERS\/MICROSOFT.COMPUTE\/VIRTUALMACHINES\/DDATWIN2019\",\"category\": \"Administrative\",\"callerIpAddress\": \"34.224.243.198\",\"level\": \"Warning\",\"operationName\": \"MICROSOFT.AUTHORIZATION\/POLICIES\/AUDIT\/ACTION\",\"correlationId\": \"aea8188d-9be4-48c7-9e8f-3bf10a545ff3\",\"time\": \"2020-04-07T12:01:17.8810661Z\",\"resultType\": \"Success\",\"durationMs\": 123,\"resultSignature\": \"Succeeded.\", \"identity\": {\"authorization\": {\"evidence\": {\"principalId\": \"5f6793e4783e49a2bfb8cdaasdada\"}}}}" }
{ "source" : "azure.compute", "message": "{\"resourceId\": \"\/SUBSCRIPTIONS\/8C56D827-5F07-45CE-8F2B-6C5001DB5C6F\/RESOURCEGROUPS\/KITCHEN-DD-AGENT-SLES-12-AZURE-20200408T112721PL2367806-A6\/PROVIDERS\/MICROSOFT.COMPUTE\/VIRTUALMACHINES\/DD-AGENT-TESTING-SLES-12-AZURE\",\"category\": \"ResourceHealth\",\"properties\": {\"eventCategory\": \"ResourceHealth\",\"eventProperties\": {\"cause\": \"PlatformInitiated\",\"details\": \"Unknown\",\"currentHealthStatus\": \"Unavailable\",\"previousHealthStatus\": \"Unavailable\",\"title\": \"Down: Virtual machine has been unavailable for 15 minutes\",\"type\": \"Downtime\"}},\"level\": \"Critical\",\"correlationId\": \"dba965ff-6161-4837-a2bf-95b355ddf2d0\",\"operationName\": \"Microsoft.Resourcehealth\/healthevent\/Updated\/action\",\"time\": \"2020-04-08T11:49:12.1063734Z\",\"resultType\": \"Updated\", \"identity\": {\"authorization\": {\"evidence\": {\"principalId\": \"5f6793e4783e49a2bfb8cdaasdada\"}}}}" }
{ "source" : "etcd", "message": "2020-08-20 18:19:25.769683 I | embed: ready to serve client requests" }
{ "source" : "etcd", "message": "2020-08-20 18:23:44.714594 W | rafthttp: lost the TCP streaming connection with peer d282ac2ce600c1ce (stream Message reader)" }
{ "source" : "glog", "message": "I0402 08:06:51.571361       1 static_autoscaler.go:366] Calculating unneeded nodes" }
{ "source" : "glog", "message": "I0402 08:06:51.571361       1 k8sclient.go:251] Replicas are not as expected : updating replicas from 37 to 36" }
{ "source" : "glog", "message": "E0402 08:06:51.571361       1 authentication.go:65] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid" }
{ "source" : "auth0", "message": "{\"log_id\":\"90020200123T15145913700000000000000000000000000000000000\",\"data\":{\"date\":\"2020-01-23T15:14:59.137Z\",\"tenant\":\"datadog\",\"log_id\":\"90020200123T15145913700000000000000000000000000000000000\",\"type\":\"feoobft\",\"connection\":\"great-pollo\",\"connection_id\":\"con_DFOp2K6Ysya1y2OP\",\"client_id\":\"61PIEqk9Hd4hXJFZhoBZ5bNmN349coz1\",\"client_name\":\"greatballena\",\"ip\":\"79.153.77.39\",\"user_agent\":\"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 Safari\/537.36\",\"details\":\"whatever...\",\"hostname\":\"awesome.bisonte.auth0.com\",\"user_id\":\"auth0|61PIEqk9Hd4hXJFZhoBZ5bNmN349coz1\",\"user_name\":\"cleveráguila\",\"strategy\":\"auth0\",\"strategy_type\":\"database\",\"description\":\"waaaaa\"}}" }
{ "source" : "auth0", "message": "{\"log_id\":\"90020200123T15145913700000000000000000000000000000000000\",\"data\":{\"date\":\"2020-01-23T15:14:59.137Z\", \"type\":\"admin_update_launch\", \"description\":\"Auth0 has launched a new admin update\"}}" }
{ "source" : "kube_scheduler", "message": "W0424 11:47:41.605188       1 authorization.go:47] Authorization is disabled" }
{ "source" : "kube_scheduler", "message": "I0424 11:47:41.605254       1 deprecated_insecure_serving.go:49] Serving healthz insecurely on 127.0.0.1:10251" }
{ "source" : "kube_scheduler", "message": "E0424 11:47:46.923664       1 reflector.go:134] k8s.io\/client-go\/informers\/factory.go:132: Failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User \"system:kube-scheduler\" cannot list resource \"persistentvolumeclaims\" in API group \"\" at the cluster scope" }
{ "source" : "amazon-ecs-agent", "message": "2019-08-01T15:45:13Z [INFO] Managed task [arn:aws:ecs:us-east-1:123456789012:task\/12345c0c-123b-1234-bea2-fc1123d1234a]: waiting for event for task" }
{ "source" : "amazon-ecs-agent", "message": "2019-08-01T15:28:07Z [INFO] Managed task [arn:aws:ecs:us-east-1:123456789012:task\/test-dev-test-ecs-general\/123456e12a2c1234bf311e12a1d123a1]: got resource [cgroup] event: [CREATED]" }
{ "source" : "amazon-ecs-agent", "message": "2019-08-01T15:18:23Z 200 123.11.0.1:12345 \"\/v2\/credentials\" \"Java\/1.8.0_111\" arn:aws:ecs:eu-west-1:123456789012:task\/12f3b123-d12e-4c12-9ed0-a89daaf12345 GetCredentials 2 test-dev-test-ecs-general arn:aws:ecs:eu-west-1:123456789012:container-instance\/test-dev-test-ecs-general\/2bcef0cd123456a234f7cffb234bb234" }
{ "source" : "amazon-ecs-agent", "message": "2019-08-01T14:59:17Z [INFO] Managed task [arn:aws:ecs:us-east-1:123456789012:task\/123ea123-1234-4c4e-a123-8cf4daw4d6sa]: task not steady state or terminal; progressing it" }
{ "source" : "amazon-ecs-agent", "message": "2019-08-01T14:47:05Z [INFO] Saving state! module=\u201Cstatemanager\"" }
{ "source" : "amazon-ecs-agent", "message": "2019-08-01T14:54:04Z [INFO] Managed task [arn:aws:ecs:us-west-2:123456789012:task\/test-dev-test-ecs-general\/c23f3b3f9be12345a123f8a8bf0e1234]: no longer waiting" }
{ "source" : "amazon-ecs-agent", "message": "level=info time=2019-12-12T23:43:29Z msg=\"Loading configuration\" module=agent.go" }
{ "source" : "nodejs", "message": "{\"name\":\"myapp\",\"hostname\":\"COMP11257.local\",\"pid\":10428,\"level\":10,\"msg\":\"trace log\",\"time\":\"2019-10-21T19:13:23.185Z\",\"v\":0}" }
{ "source" : "nodejs", "message": "{\"level\":50,\"time\":1571685203246,\"pid\":10429,\"hostname\":\"COMP11257.local\",\"msg\":\"error log\",\"v\":1}" }
{ "source" : "nodejs", "message": "[2019-10-21T15:13:23.419] [INFO] default - An info message" }
{ "source" : "nodejs", "message": "[dd.trace_id=123abc dd.span_id=1234abc] [2019-10-21T15:13:23.419] [INFO] default - An info message" }
{ "source" : "nodejs", "message": "[2019-10-21T15:13:23.419] [INFO] default - [dd.trace_id=123abc dd.span_id=1234abc] An info message" }
{ "source" : "nodejs", "message": "GET www.google.com 304 5000 - 2.306 ms" }
{ "source" : "nodejs", "message": "::1 - - [21\/Oct\/2019:19:16:34 +0000] \"GET \/ HTTP\/1.1\" 504 - \"-\" \"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/77.0.3865.90 Safari\/537.36\"" }
{ "source" : "nodejs", "message": "[dd.trace_id=123abc dd.span_id=1234abc] { \"level\": \"info\", \"message\": \"This is a json message\" }" }
{ "source" : "nodejs", "message": "[dd.service=excelsior dd.env=jesse-dev dd.version=abc123 dd.trace_id=123abc dd.span_id=1234abc] { \"level\": \"info\", \"message\": \"This is a json message\" }" }
{ "source" : "nodejs", "message": "[dd.service=excelsior dd.env=jesse-dev dd.version=abc123 dd.trace_id=123abc dd.span_id=1234abc] [2019-10-21T15:13:23.419] [INFO] default - An info message" }
{ "source" : "nodejs", "message": "[2019-10-21T15:13:23.419] [INFO] default - [dd.service=excelsior dd.env=jesse-dev dd.version=abc123 dd.trace_id=123abc dd.span_id=1234abc] An info message" }
{ "source" : "postgresql", "message": "2017-11-07 17:30:39 UTC LOG:  incomplete startup packet" }
{ "source" : "postgresql", "message": "2017-11-08 18:11:35.727 UTC [5237] postgres psql postgres [local] 5a0348cd.1475 LOG:  statement: SELECT * FROM playground;" }
{ "source" : "postgresql", "message": "2019-12-10 18:27:45.389 UTC [114] datadog_test datadog-agent datadog 172.28.0.1 5defc7c5.72 LOG:  duration: 0.140 ms  statement: select checkpoints_timed, checkpoints_req, buffers_checkpoint, buffers_clean, maxwritten_clean, buffers_backend, buffers_alloc, buffers_backend_fsync, checkpoint_write_time, checkpoint_sync_time FROM pg_stat_bgwriter" }
{ "source" : "cassandra", "message": "WARN  [MemtableFlushWriter:20342] 2016-06-29 08:35:35,936  SSTableWriter.java:240 - Compacting large partition limo\/read_audit_history:2016-6-29 (115624250 bytes)" }
{ "source" : "cassandra", "message": "INFO  [SlabPoolCleaner] 2016-06-28 16:13:14,966  ColumnFamilyStore.java:1211 - Flushing largest CFS(Keyspace='order_store', ColumnFamily='orderdata') to free up room. Used total: 0.11\/0.00, live: 0.11\/0.00, flushing: 0.00\/0.00, this: 0.10\/0.10" }
{ "source" : "cassandra", "message": "INFO  [SlabPoolCleaner] 2016-06-28 16:13:14,967  ColumnFamilyStore.java:905 - Enqueuing flush of orderdata: 216588472 (10%) on-heap, 0 (0%) off-heap" }
{ "source" : "cassandra", "message": "INFO  [MemtableFlushWriter:320] 2016-06-28 16:13:14,969  Memtable.java:347 - Writing Memtable-orderdata@849528272(129.804MiB serialized bytes, 297813 ops, 10%\/0% of on\/off-heap limit)" }
{ "source" : "cassandra", "message": "INFO  [MemtableFlushWriter:1] 2016-06-28 16:19:48,627  Memtable.java:382 - Completed flushing \/app\/cassandra\/datastax\/dse-data01\/system\/local-7ad54392bcdd35a684174e047860b377\/system-local-tmp-ka-3981-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1467130786324, position=567)" }
{ "source" : "cassandra", "message": "INFO  [CompactionExecutor:26] 2016-06-28 14:38:40,545  CompactionTask.java:274 - Compacted 2 sstables to [\/app\/cassandra\/datastax\/dse-data03\/order_store\/orderhistory-65765df1fc1f11e5974cd5203249f77a\/order_store-orderhistory-ka-139517,].  32,011,749 bytes to 29,390,144 (~91% of original) in 1,771ms = 15.826440MB\/s.  39,261 total partitions merged to 36,402.  Partition merge counts were {1:33543, 2:2859, }" }
{ "source" : "cassandra", "message": "WARN  [ServiceThread] 2016-06-30 14:42:41,538  GCInspector.java:256 - G1 Young Generation GC in 202ms.  G1 Eden Space: 1157627904 -> 0; G1 Old Gen: 6819034128 -> 4151661080; G1 Survivor Space: 553648128 -> 218103808;" }
{ "source" : "cassandra", "message": "INFO  [ServiceThread] 2016-06-30 14:42:41,542  StatusLogger.java:99 - KeyCache                  365405196               1073741824                      all" }
{ "source" : "cassandra", "message": "WARN  [ServiceThread] 2016-06-30 14:42:41,538  GCInspector.java:256 - whatever" }
{ "source" : "cassandra", "message": "2016-06-30 14:42:41 [ServiceThread] WARN GCInspector.java:256 - whatever" }
{ "source" : "cassandra", "message": "DEBUG [ValidationExecutor:108] 2016-06-30 14:42:41,538 EstimatedHistogram.java:304 - [0..0]: 1" }
{ "source" : "cassandra", "message": "DEBUG [PerDiskMemtableFlushWriter_0:6272] 2019-02-11 16:22:18,428 Memtable.java:485 - Completed flushing \/cassdata\/system\/batches-845e4cb463211234e12b64ff2c24a132\/321-big-Data.db (17.861KiB) for commitlog position CommitLogPosition(segmentId=1539908174810, position=5726318)" }
{ "source" : "cassandra", "message": "DEBUG [ScheduledTasks:1] 2018-11-22 12:50:31,441  MonitoringTask.java:173 - 3 operations were slow in the last 5000 msecs:\n<SELECT * FROM offernet.work_e WHERE community_id = 1496222336 AND member_id = 49 AND ~~edge_label_id = 65550 LIMIT 50000>, time 505 msec - slow timeout 500 msec\n<SELECT * FROM offernet.item_e WHERE community_id = 2052914560 AND member_id = 189 AND ~~edge_label_id = 65553 LIMIT 50000>, time 505 msec - slow timeout 500 msec\n<SELECT * FROM offernet.item_e WHERE community_id = 2140990336 AND member_id = 212 AND ~~edge_label_id = 65553 LIMIT 50000>, time 504 msec - slow timeout 500 msec\n" }
{ "source" : "cassandra", "message": "DEBUG [ScheduledTasks:1] 2019-12-30 15:24:19,781 MonitoringTask.java:173 - 1 operations were slow in the last 5000 msecs:\n<SELECT * FROM system_schema.tables LIMIT 100>, was slow 2 times: avg\/min\/max 2\/2\/3 msec - slow timeout 2 msec\n<SELECT * FROM system_schema.columns LIMIT 200>, was slow 3 times: avg\/min\/max 4\/5\/6 msec - slow timeout 3 msec\n" }
{ "source" : "cassandra", "message": "TRACE [GossipStage:1] 2018-07-04 17:07:47,879 Gossiper.java:1234 - Updating heartbeat state version to 2344 from 2343 for 127.0.0.2:7000" }
{ "source" : "cassandra", "message": "TRACE [SharedPool-Worker-1] 2016-02-05 19:44:00,154 Message.java:506 - Received: OPTIONS, v=4" }
{ "source" : "cassandra", "message": "TRACE [SharedPool-Worker-1] 2016-02-05 19:44:09,346 Message.java:525 - Responding: ROWS [user_id(tlp_example, user_data), org.apache.cassandra.db.marshal.UUIDType][email(tlp_example, user_data), org.apache.cassandra.db.marshal.UTF8Type][tax_id(tlp_example, user_data), org.apache.cassandra.db.marshal.UTF8Type] | db8f7a00-cc3e-11e5-b248-091830ac5256 | nate+logex@thelastpickle.com | abc123 | 19be0e40-cc3f-11e5-b248-091830ac5256 | nate+logex@thelastpickle.com | abc123 ---, v=4" }
{ "source" : "httpd", "message": "127.0.0.1 - frank [13\/Jul\/2016:10:55:36 +0000] \"GET \/apache_pb.gif HTTP\/1.0\" 200 2326" }
{ "source" : "httpd", "message": "192.0.2.1 - Ultan [07\/Mar\/2004:16:43:54 -0800] \"GET \/unencrypted_password_list?foo=bar HTTP\/1.1\" 404 9001 \"http:\/\/passwords.hackz0r\" \"Mozilla\/4.08 [en] (Win95)\"" }
{ "source" : "azure.recoveryservices", "message": "{\"Authorization\":\"null\",\"Claims\":\"{\\\"http:\/\/schemas.xmlsoap.org\/ws\/2005\/05\/identityclaims\/emailaddress\\\":\\\"Microsoft.RecoveryServices\\\"}\",\"correlationId\":\"1054d95a-7364-420e-a016-aceb8a4bb7dc\",\"DeploymentUnit\":\"cus-pod01-manag1-az01-testr\",\"durationMs\":0,\"EventId\":197,\"eventName\":\"Backup\",\"EventName\":\"AzureBackupActivityLog\",\"evt\":{\"category\":\"Administration\",\"name\":\"Microsoft.RecoveryServices\\\/values\\\/backupFabrics\\\/protectionContainers\\\/protectedItems\\\/backup\\\/action\",\"outcome\":\"Succeeded\"},\"identity\":\"{\\\"claims\\\":{\\\"http:\/\/schemas.xmlsoap.org\/ws\/2005\/05\/identityclaims\/emailaddress\\\":\\\"Microsoft.RecoveryServices\\\"}}\",\"level\":\"Informational\",\"location\":\"centralus\",\"operationId\":\"6caea86d-3f4c-4c5f-b2ba-b53e502adb84\",\"operationVersion\":\"null\",\"properties\":{\"Entity Name\":\"test-6\",\"Job Id\":\"aa52fedc-8783-41a4-b285-39dd2f41cd2d\",\"Start Time\":\"2021-07-19 16:32:41Z\"},\"resourceId\":\"\\\/SUBSCRIPTIONS\\\/A4F42A47-2B5B-4A4C-B6CF-F00D6C189365\\\/RESOURCEGROUPS\\\/TESTRG\\\/PROVIDERS\\\/MICROSOFT.RECOVERYSERVICES\\\/VAULTS\\\/VAULT428\",\"Result Description\":\"Backup Succeeded\",\"service\":\"azure\",\"time\":\"2021-07-19T16:53:55.5361483Z\"}" }
{ "source" : "csharp", "message": "{\"date\":\"2017-03-10T15:13:53.1302217+01:00\",\"level\":\"INFO\",\"appname\":\"ConsoleApplication1.vshost.exe\",\"logger\":\"ConsoleApplication1.Program\",\"thread\":\"10\",\"message\":\"A calculation is about to start: 10x5\"}" }
{ "source" : "csharp", "message": "2016-09-02T15:02:29.648Z INFO this is my message" }
{ "source" : "csharp", "message": "2016-05-24 15:53:35.7175 INFO this my my log example" }
{ "source" : "csharp", "message": "2017-03-10 15:13:53.130 ERROR [10] ConsoleApplication1.Program mymethod:32 - an error occured while processing" }
{ "source" : "csharp", "message": "2019-10-24 19:58:29,582 [1] INFO  Log4NetExample.Program [(null)] {order-number=1024, log4net:UserName=NOT AVAILABLE, dd.span_id=\"1413846021656668177\", log4net:HostName=LAPTOP-KUAFQR15, log4net:Identity=NOT AVAILABLE, dd.trace_id=\"4603319493700240667\"} - Message example" }
{ "source" : "csharp", "message": "{\"@t\":\"2019-10-24T21:27:13.8349907Z\",\"@mt\":\"Message inside a trace.\",\"dd.span_id\":\"5142026215797304283\",\"dd.trace_id\":\"4878079399117499473\",\"order-number\":1024}" }
{ "source" : "csharp", "message": "{\"MessageTemplate\":\"Processed {@Position} in {Elapsed:000} ms.\",\"Level\":\"Information\",\"Timestamp\":\"2016-09-02T15:02:29.648Z\",\"Renderings\":{\"Elapsed\":[{\"Format\":\"000\",\"Rendering\":\"034\"}]},\"RenderedMessage\":\"Processed { Latitude: 25, Longitude: 134 } in 034 ms.\",\"Properties\":{\"Position\":{\"Latitude\":25,\"Longitude\":134},\"Elapsed\":34}}" }
{ "source" : "csharp", "message": "{\"@t\":\"2020-05-20T04:15:28.6898801Z\",\"@m\":\"Processed { Latitude: 25, Longitude: 134 } in 034 ms.\",\"@i\":\"d1eb2146\",\"Position\":{\"Latitude\":25,\"Longitude\":134},\"Elapsed\":34}" }
{ "source" : "csharp", "message": "{\"@t\":\"2020-10-08T22:57:43.6528621Z\",\"@m\":\"Entered Home\/Index action\",\"SourceContext\":\"AspNetCoreWithSerilog.Controllers.HomeController\",\"RequestPath\":\"\/\",\"dd_span_id\":\"1716695354597411619\",\"dd_trace_id\":\"6500362470500556863\",\"dd_version\":\"1.0.0\",\"dd_service\":\"AspNetCoreWithSerilog\",\"dd_env\":\"Development\"}" }
{ "source" : "csharp", "message": "2019-11-25 11:21:32.6282|INFO|NLogExample.Program|{dd.trace_id=\"4424326124026709723\", dd.span_id=\"249969541933646857\"} Message during a trace." }
{ "source" : "csharp", "message": "2019-12-02 11:47:20.467 -08:00 [INF] { dd.span_id: \"4014120331748607290\", dd.trace_id: \"2762343115747197096\", order-number: 1024 } Message during a trace." }
{ "source" : "csharp", "message": "2019-12-02 11:47:20.467 -08:00 [INF] { dd.service: excelsior dd.env: prod dd.version=abc123 dd.span_id: \"4014120331748607290\", dd.trace_id: \"2762343115747197096\", order-number: 1024 } Message during a trace." }
{ "source" : "csharp", "message": "2019-11-25 11:21:32.6282|INFO|NLogExample.Program|{dd.service=excelsior dd.env=prod dd.version=abc123 dd.trace_id=\"4424326124026709723\", dd.span_id=\"249969541933646857\"} Message during a trace." }
{ "source" : "csharp", "message": "2019-10-24 19:58:29,582 [1] INFO  Log4NetExample.Program [(null)] {order-number=1024, log4net:UserName=NOT AVAILABLE, dd.span_id=\"1413846021656668177\", dd.service=excelsior, dd.env=prod, dd.version=abc123, log4net:HostName=LAPTOP-KUAFQR15, log4net:Identity=NOT AVAILABLE, dd.trace_id=\"4603319493700240667\"} - Message example" }
{ "source" : "csharp", "message": "{\"@t\":\"2019-10-24T21:27:13.8349907Z\",\"@m\":\"Message inside a trace.\",\"@mt\":\"this is the message template\",\"dd.span_id\":\"5142026215797304283\",\"dd.trace_id\":\"4878079399117499473\",\"order-number\":1024,\"@l\":\"Error\"}" }
{ "source" : "browser", "message": "{\"http\":{\"url\":\"\/datadoghq\/company?test=var1%20Pl\",\"useragent\":\"Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/55.0.2883.87 Safari\/537.36\"},\"message\":\"Button Clicked\",\"name\":\"registerButton\",\"network\":{\"client\":{\"ip\":\"172.17.0.1\"}}}" }
{ "source" : "browser", "message": "{\t\"message\": \"Browser Log Collector started and logging debug logs to http\",\t\"agent\": \"browserLogCollector\",\t\"datadog_version\": \"35.1859776\",\t\"session_id\": \"20cdb46b-eb38-48c8-8037-baa19b188f1e\",\t\"http\": {\t\t\"referer\": \"https:\/\/app.datadoghq.com\/account\/team\",\t\t\"useragent\": \"Mozilla\/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko\/20100101 Firefox\/70.0\"\t},\t\"datadog_version_number\": 1859776,\t\"network\": {\t\t\"client\": {\t\t\t\"ip\": \"81.175.170.234\"\t\t}\t},\t\"status\": \"info\",\t\"date\": 1573031676010,\t\"view\": {\t\t\"referrer\": \"https:\/\/app.datadoghq.com\/account\/profile\",\t\t\"id\": \"9252b1eb-5910-4f6f-9b09-a6cb76609be3\",\t\t\"url\": \"https:\/\/app.datadoghq.com\/account\/team\"\t},\t\"sessionId\": \"20cdb46b-eb38-48c8-8037-baa19b188f1e\",\t\"application_id\": \"ac8218cf-498b-4d33-bd44-151095959547\",\t\"browser_test\": false}" }


### Version

635ddcc2258ad591005c0f9056858f4b87c2a8b4
blt commented 2 years ago

To add on to this, what we're seeing here may not be isolated to this configuration but simply easy to reproduce with it. We have started to see soak runs fail because of NaN exceptions. If you pull the underlying capture data what's evident is Vector will accept load briefly and then soft-lock. For instance, this soak run fails but when pulling the underlying capture files for fluent-elasticsearch you see: samples

So, Vector ran load the whole time. On the other hand you can find samples that look like this:

samples

This is http-to-http-noacks from the same linked run.

All of this is to say there seem to be related problems with other configs but it's not clear if the problem identified above is unique to this config or is a symptom of a general problem.

jszwedko commented 2 years ago

As part of this issue, we should also re-enable the http-to-http-noacks soak disabled by https://github.com/vectordotdev/vector/pull/11712. I'm hopeful that the underlying issue is the same. If it isn't, we can split off that soak into its own issue.

blt commented 2 years ago

As part of this issue, we should also re-enable the http-to-http-noacks soak disabled by #11712. I'm hopeful that the underlying issue is the same. If it isn't, we can split off that soak into its own issue.

Ah, yes, thank you for calling that out. Totally spaced on doing so.

001wwang commented 2 years ago

To share more details in this on-going investigation, we've arrived at

This was tested on 9aabd3632, a recent edition of master. The only modification, beyond debug statements, is arbitrarily changing CHUNK_SIZE to 777 to confirm a suspicion. By default, CHUNK_SIZE = 1000, and lock up should still occur.

Run the following commands for vector and lading:

VECTOR_THREADS=8 firejail --noprofile --cpu=2,3,4,5,6,7,8,9 target/release/vector -c .tmp/pipeline-lock-up/http-to-blackhole.toml
firejail --noprofile --cpu=1 target/release/http_gen --config-path .tmp/pipelines-lock-up/http-to-http-noack.yaml

With Lading generating very large payloads in this setup, Vector locks up almost immediately (evident in the internal metrics or the info logs showing blackhole received events).

Analysis

We've likely ruled out a few potential issues:

So far, lock up seems like a race condition/deadlock problem. The current candidate for investigation is LimitedSender/LimitedReceiver which leverage semaphores to coordinate work, though it's still unclear why permits might not be freed.

If you compile the project with tokio-console bits included

RUSTFLAGS="-Cforce-frame-pointers=yes --cfg tokio_unstable" CARGO_PROFILE_RELEASE_DEBUG=2 cargo build --release --features "tokio-console"

And run until a lock up occurs, you'll see lines in console like the below, describing semaphore resources that appear to forever idle on acquiring permits.

tokio::sync::semaphore       Semaphore ✅  /home/ubuntu/dd/vector/lib/vector-buffers/src/topology/channel/limited_queue.rs:293:46  permits=15545

One hypothesis from @blt is that the underlying ArrayQueue structure's push/pop functions might not be appropriately synchronized when reading from the internal head of the queue, leading to a race condition. Or, in general, there might be an issue in the mechanism behind the writer acquiring permits, pushing data into the internal queue, and awakening the reader to pop data from the queue, freeing the permits.

Note

blt commented 2 years ago

One hypothesis from @blt is that the underlying ArrayQueue structure's push/pop functions might not be appropriately synchronized when reading from the internal head of the queue, leading to a race condition. Or, in general, there might be an issue in the mechanism behind the writer acquiring permits, pushing data into the internal queue, and awakening the reader to pop data from the queue, freeing the permits.

I went for a walk and thought about this some more. I think it's possible that the waker notification will arrive before data is available in the underlying queue but I'm not sure that this would cost anything more than additional polls. So, more latency but not a full lock up.

blt commented 2 years ago

Oh, it might also be interesting to see if you can replicate the problem with #11732.