grafana / k6-operator

An operator for running distributed k6 tests.
Apache License 2.0
579 stars 159 forks source link

Sending k6 results to datadog? #63

Open nwithers-ecr opened 3 years ago

nwithers-ecr commented 3 years ago

When I'm testing k6 locally through docker-compose, I am able to see the results being populated in the datadog web dashboard. However, I'm struggling to convert this behavior to the k8s operator. Below is the configuration I've got so far, with datadog deployed as a helm chart in a namespace called monitors and the k6 operator deployed at version 0.0.6

docker-compose.yml

  api-smoke-test:
    image: loadimpact/k6
    entrypoint: k6 run --out statsd index.js
    depends_on:
      - core
      - api-gateway
    links:
      - datadog
    working_dir: /test/
    environment:
      - K6_STATSD_ADDR=datadog:8125
    volumes:
      - $PWD:/test/

  datadog:
    image: datadog/agent:latest
    ports:
      - 8125
    environment:
      - DD_API_KEY=${DD_API_KEY}
      - DD_SITE=datadoghq.com
      - DD_DOGSTATSD_NON_LOCAL_TRAFFIC=1
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup:/host/sys/fs/cgroup:ro

datadog-values.yaml

datadog:
  dogstatsd:
    port: 8125
    useHostPort: true
    nonLocalTraffic: true

resource.yml

---
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
spec:
  parallelism: 4
  arguments: --out statsd
  script: 
    configMap: 
      name: k6-test
      file: test.js
  ports:
    - containerPort: 8125

configmap.yml

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: k6-test
data:
  test.js: |
    import http from "k6/http";
    import { check, group, fail } from "k6";

    // add new endpoint url suffixes here to expand smoke test
    // these endpoints should match exactly the keys in the setup
    let ERR_MSG = "API smoke test failed for endpoint:"

Current output of kubectl apply -f resource.yml

❯ kubectl logs k6-sample-1-v2tdg

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

time="2021-08-12T22:20:02Z" level=warning msg="Executor 'default' is disabled for segment 0:1/4 due to lack of work!"
  execution: local
     script: /test/test.js
     output: statsd (localhost:8125)

  scenarios: (25.00%) 1 scenario, 0 max VUs, 0s max duration (incl. graceful stop):
           * default: 1 iterations for each of 0 VUs (maxDuration: 10m0s, gracefulStop: 30s)

time="2021-08-12T22:20:05Z" level=warning msg="No script iterations finished, consider making the test duration longer"

     vus.......: 0 min=0 max=0
     vus_max...: 0 min=0 max=0

time="2021-08-12T22:20:05Z" level=error msg="Couldn't flush a batch" error="write udp 127.0.0.1:52620->127.0.0.1:8125: write: connection refused" output=statsd

I believe what I'm missing is either the K6_STATSD_ADDR or the DD_AGENT_HOST environment variables (or both) which can be set with the below code. However I'm not certain how to add these env vars to the k6-sample pods.

env:
- name: DD_AGENT_HOST
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

Any ideas or helpful advice on how I can accomplish this?

dgzlopes commented 3 years ago

Hello!

We've added support for environment variables. So, once you've exposed the Datadog Agent as a service, you should be able to do something like this:

---
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
spec:
  parallelism: 4
  arguments: --out statsd
  script: 
    configMap: 
      name: k6-test
      file: test.js
  env:
    - name: K6_STATSD_ADDR
      value: <servicename>.<namespace>.svc.cluster.local
  ports:
    - containerPort: 8125
mycargus commented 3 years ago

I can confirm dgzlopes's suggestion works. I exposed the datadog agent as a service named datadog-agent in namespace my-namespace and this config did the trick:

---
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
spec:
  parallelism: 4
  arguments: --out statsd
  script: 
    configMap: 
      name: k6-test
      file: test.js
  env:
    - name: K6_STATSD_ENABLE_TAGS
      value: "true"
    - name: K6_STATSD_ADDR
      value: datadog-agent.my-namespace.svc.cluster.local:8125

I had to include the port :8125.

Also I had to add K6_STATSD_ENABLE_TAGS=true to the K6 spec as indicated in the above yaml.

nwithers-ecr commented 3 years ago

I must be doing something wrong on the datadog side. When I run kubectl get svc -n monitors I do not see a service listening on 8125.

$ kubectl get sev -n monitors
NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
datadog-agent-cluster-agent                        ClusterIP   10.100.0.72      <none>        5005/TCP   7m17s
datadog-agent-cluster-agent-admission-controller   ClusterIP   10.100.111.229   <none>        443/TCP    7m17s
datadog-agent-kube-state-metrics                   ClusterIP   10.100.195.155   <none>        8080/TCP   7m17s

If I create a ClusterIP explicitly, It still fails to connect with the same error

kubectl expose pod --type="ClusterIP" --port 8125 --namespace monitors datadog-agent-hkwf9
NAME                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
datadog-agent-cluster-agent        ClusterIP   10.100.0.72      <none>        5005/TCP   77m
datadog-agent-hkwf9                ClusterIP   10.100.203.231   <none>        8125/UDP   15m
- name: K6_STATSD_ADDR
    value: 10.100.203.231:8125

OR

- name: K6_STATSD_ADDR
    value: datadog-agent-hkwf9.monitors.svc.cluster.local:8125

Both fail with

time="2021-09-04T00:58:49Z" level=error msg="Couldn't flush a batch" error="write udp 127.0.0.1:41460->127.0.0.1:8125: write: connection refused" output=statsd

my full datadog-values.yaml shows that dogstatsd should definitely be listening.

---
registry: public.ecr.aws/datadog
datadog:
  apiKeyExistingSecret: datadog-secret
  clusterName: <ommitted> 
  logs:
    containerCollectAll: true
  dogstatsd:
    port: 8125
    useHostPort: true
    nonLocalTraffic: true
  apm:
    portEnabled: true
  processAgent:
    processCollection: true
  networkMonitoring:
    enabled: true
clusterAgent:
  admissionController:
    enabled: true
  tokenExistingSecret: "datadog-auth-token"

what's more, if I kubectl exec into the datadog agent pod. connecting via localhost fails, but on port 8126 succeeds. Which is expected since I have apm enabled.

root@datadog-agent-9699q:/# curl -I --connect-timeout 1 127.0.0.1:8125
curl: (7) Failed to connect to 127.0.0.1 port 8125: Connection refused

root@datadog-agent-9699q:/# curl -I --connect-timeout 1 127.0.0.1:8126
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Sat, 04 Sep 2021 01:12:08 GMT
Content-Length: 19

I'm pinned to version 0.6.0 of the operator. Should I be running this on the main branch, or did i miss something obvious in the Kubernetes networking? If not, I can close this issue since it's confirmed working for others and open a ticket with datadog support.

mycargus commented 3 years ago

@nwithers-ecr I had it working with the v0.6.0 operator. My guess is the problem is hiding in the datadog agent config, or perhaps in your kubernetes networking or RBAC. I would talk with Datadog, they've been helpful to me in the past. Good luck!

mpanchuk commented 3 years ago

@nwithers-ecr I had same issue with sending results to Datadog. I spent some time and investigated reasons.

I had same error: "Couldn't flush a batch" error="write udp 127.0.0.1:41460->127.0.0.1:8125: write: connection refused", which means that datadog-agent tries to send data to localhost (default address), this also means, that environment variables are not making any effect:

---
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
  ....
  env:
    - name: K6_STATSD_ADDR
      value: datadog-agent.my-namespace.svc.cluster.local:8125 # <---- THIS VAR IS NOT DELIVERED TO CONTAINER

I double checked this by entering k6-sample container and printed all vars (printenv) And the reason for that is: environment variables pull request was closed. But instead latest version of k6-operator has possibility to override runner.

So here is working solution: Datadog agent deployment:

# datadog-agent-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: datadog-agent-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      component: datadog-agent
  template:
    metadata:
      labels:
        component: datadog-agent
    spec:
      containers:
        - name: datadog-agent
          image: datadog/agent:latest
          ports:
            - containerPort: 8125
          env:
            - name: DD_SITE
              value: datadoghq.eu
            - name: DD_API_KEY
              value: <YOUR_DATADOG_API_KEY> # BUT!!! better way is to create k8s secret with key and use envFrom.secretRef
            - name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
              value: "1"

Datadog agent cluster ip service:

# datadog-agent-cluster-ip-service.yml
apiVersion: v1
kind: Service
metadata:
  name: datadog-agent-cluster-ip-service
spec:
  type: ClusterIP
  selector:
    component: datadog-agent
  ports:
    - targetPort: 8125
      protocol: UDP
      port: 8125

K6 resource:

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-sample
spec:
  parallelism: 4
  arguments: --out statsd --verbose
  script:
    configMap: 
      name: crocodile-stress-test
      file: performance.js
  scuttle:
    enabled: "false"
  runner:                                              # <=== HERE
    image: loadimpact/k6:latest
    env:                                               # <=== env is part of runner spec
      - name: K6_STATSD_ENABLE_TAGS
        value: "true"
      - name: K6_STATSD_ADDR
        value: datadog-agent-cluster-ip-service:8125

Also I want to point one more important note and the problem I faced: Datadog agent doesn't aggregate metrics from few jobs, so in Datadog dashboard I'm getting metrics from one of k6-sample runner (maybe I'm doing something wrong). In my script I have 20 VUs, parallelism is 4 and is Datadog I'm getting 5 max VUs, which means that Datadog agent is not combining data from all k6 runners:

image

knechtionscoding commented 3 years ago

@mpanchuk Glad you have a solution! Thanks for sharing it, would you be willing to submit a pr with a new readme on the integration?

And the aggregation part is definitely thing being thought about in a broader context because it's a problem everywhere.

mpanchuk commented 3 years ago

@KnechtionsCoding yes, I would create PR with updated readme.

na-- commented 3 years ago

I'll close this now, since, from what I understand, this would be better resolved with a PR to https://github.com/grafana/k6-docs?

knechtionscoding commented 3 years ago

@na-- unless all the operator documentation has been moved over there, no. This needs to live with the k6-operator documentation, because it is specific to the k6-operator/K8s.

na-- commented 3 years ago

My mistake, you are completely right, sorry! :man_facepalming:

nwithers-ecr commented 3 years ago

@mpanchuk Thank you for this. I applied your changes and it's working correctly.

yorugac commented 2 years ago

I think this case can be documented in two ways: 1) update to k6-operator's README on how to pass an environment variable to any pod in K6, starter and runner both (this is currently absent) 2) possibly a guide on how to setup Datadog with k6-operator

Also, right now passing env outside of runner or starter spec would result in validation error so similar cases should be easier to set up.

cko-siavash-delkhosh commented 1 year ago

Here is the way we figured to do it k6 is running on a docker datadog agent is running on a docker as-well

We found out that we need to add a new step which get the ip address of the datadog docker container and add it to the K6_STATSD_ADDR

 - name: Docker Agent
        env:
          DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
        run: |
          DOCKER_CONTENT_TRUST=1 \
          docker run -d --network bridge \
          --name datadog \
          -v /var/run/docker.sock:/var/run/docker.sock:ro \
          -v /proc/:/host/proc/:ro \
          -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
          -e DD_SITE="datadoghq.com" \
          -e DD_API_KEY=$DD_API_KEY \
          -e DD_TAGS="YOURTAGS" \
          -e DD_DOGSTATSD_NON_LOCAL_TRAFFIC=1 \
          -p 8125:8125/udp \
          datadog/agent:latest

      - name: Wait for Agent to run...
        run: |
          sleep 10
        shell: bash
      - name: Get Datadog IP
        id: getDDIp                                     
        run: | 
          echo "::set-output name=ddIP::$( docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' datadog )"

      - name: Run k6 test
        uses: grafana/k6-action@v0.2.0
        env:
          K6_STATSD_ADDR: ${{steps.getDDIp.outputs.ddIP}}:8125
          K6_STATSD_ENABLE_TAGS: "true"

        with:
          filename: packages/tests-performance/dist/alias/create/test.spec.js
          flags: --out statsd

Hope it will helps you

jerroldgao commented 1 year ago

Datadog agent doesn't aggregate metrics from few jobs, so in Datadog dashboard I'm getting metrics from one of k6-sample runner (maybe I'm doing something wrong).

I faced the same issue. Any chance you figure this one out?

Igor992 commented 3 months ago

Did you manage to use @http.url on the custom dashboard maybe? I know that for k6 integration there is only test_run_id tag, which I can pass over arguments CR setup for runner like --tag test_run_id=<value>, but I want to make custom dashboard e.g. metrics by endpoint and based on that dropdown search over @http.url to dynamically show this tests. Is this somehow possible to handle?