otterize / network-mapper

Map Kubernetes traffic: in-cluster, to the Internet, and to AWS IAM and export as text, intents, or an image
Apache License 2.0
612 stars 23 forks source link

Emit map as an OpenTelemetry metric #141

Closed smithclay closed 1 year ago

smithclay commented 1 year ago

Description

This emits map data as an OpenTelemetry metric. Purpose is to (optionally) allow exporting edge data in OpenTelemetry format to third party tools (including Prometheus) or solutions.

Open questions

References

Testing

I've pushed a build of this PR to my public Dockerhub. It can be installed using the following helm command and data can be sent to a Lightstep project by injecting the environment variable below. The standard OTEL SDK env vars are used.

helm upgrade network-mapper otterize/network-mapper -n otterize-system --set debug=true --set mapper.repository=smithclay --set mapper.tag=latest --set mapper.pullPolicy=Always --install

kubectl set env deployments otterize-network-mapper -n otterize-system \
  OTTERIZE_ENABLE_OTEL_EXPORT=true \
  OTEL_EXPORTER_OTLP_METRICS_ENDPOINT='https://ingest.lightstep.com:443' \
  OTEL_EXPORTER_OTLP_ENDPOINT='https://ingest.lightstep.com:443' \
  OTEL_EXPORTER_OTLP_INSECURE=false \
  OTEL_METRIC_EXPORT_INTERVAL=1s \
  OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
  OTEL_EXPORTER_OTLP_METRICS_HEADERS='lightstep-access-token=your-token'

Checklist

github-actions[bot] commented 1 year ago

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

orishoshan commented 1 year ago

Thank you @smithclay! Looks overall great, will take a closer look and comment more thoroughly after the weekend :) Have a great one!

smithclay commented 1 year ago

I have read and understood the CLA and hereby agree to its terms by making this Pull Request Comment.

djspoons commented 1 year ago

This is awesome @smithclay !

I hear you on reusing the names in the service graph connector component – and I'm fine with that for now! – but it certainly seems like we'll want to rename them in some way (since these are not part of that component... and it doesn't seem to follow many current conventions!).

One sort of related question I had: can we do better than a "counter"? Maybe it's worth discussing this in real time, but there are some benefits if we can use an updown counter or (maybe?) a gauge – though it we might not have exactly the right data to make those happen. Anyway, maybe chalk this question up as "spoons wants to better understand how we can aggregate these data points."

orishoshan commented 1 year ago

@smithclay I commented on the PR :) Regarding the semantic conventions you suggested: I see that you picked something for the network mapper, are there more semantic conventions beyond that that you suggest we add? A bit of an OpenTelemetry n00b here :)

smithclay commented 1 year ago

One sort of related question I had: can we do better than a "counter"? Maybe it's worth discussing this in real time, but there are some benefits if we can use an updown counter or (maybe?) a gauge – though it we might not have exactly the right data to make those happen. Anyway, maybe chalk this question up as "spoons wants to better understand how we can aggregate these data points."

Just to recap an offline conversation: for now, going with Counter since that's at least consistent with how the servicegraphprocessor does it. Also like the idea of making the attribute/metric names overridable via config (which could expand the number of tools this could work with in the future that expect different inputs).

cartersocha commented 1 year ago

🚀

smithclay commented 1 year ago

@smithclay I commented on the PR :) Regarding the semantic conventions you suggested: I see that you picked something for the network mapper, are there more semantic conventions beyond that that you suggest we add? A bit of an OpenTelemetry n00b here :)

I think we're ok for now using the included conventions... medium-term, would like to open an issue up in the OTel Semantic conventions repository to see if we can get this "standardized" somewhat. For now, however, it should be compatible with Grafana Tempo/Lightstep.

orishoshan commented 1 year ago

Approved! Thank you @smithclay and @MadVikingGod :)

I'm still struggling with getting the new end-to-end tests running on a fork. I was hoping to use GitHub Container Registry to hold the Docker images that would run in the Minikube running in the end-to-end tests, but looks like the GITHUB_TOKEN created by GitHub Actions for GHCR in forks can only receive "read" permissions (even to the author's own GHCR), so this isn't gonna work - the CI won't be able to push the images to GHCR.

I'll come up with another solution soon to let the CI run the end-to-end test.