grafana / docker-otel-lgtm

OpenTelemetry backend in a Docker image
Apache License 2.0
324 stars 60 forks source link

Document settings required to populate Dashboards. #28

Closed yeroc closed 2 months ago

yeroc commented 8 months ago

I've been trying out this docker image as someone new to both the Grafana products and OpenTelemetry in general. I believe I'm your target audience. That said, I'm struggling to get any of the three sample Dashboards to populate with metrics using the OpenTelemetry Java Agent with my own application. I can confirm instrumentation is working because I'm able to see some metrics using Explore, I'm also seeing Traces and Logs populated as well but all the metrics dashboards remain obstinately blank.

Here are my settings for Java Agent 2.2.0:

# Settings for the opentelemetry java agent

# in ms...
otel.bsp.schedule.delay=5000
otel.metric.export.interval=5000

otel.exporter.otlp.metrics.default.histogram.aggregation=base2_exponential_bucket_histogram

# capture enduser info...
otel.instrumentation.common.enduser.enabled=true
otel.instrumentation.common.enduser.id.enabled=true

otel.instrumentation.common.peer-service-mapping=foo-host:8082=foo-service

otel.semconv-stability.opt-in=http

otel.resource.attributes=service.version=HEAD-SNAPSHOT
otel.service.name=my-service

The above settings are trying to get the JVM and RED (native histograms) dashboards to populate.

General feedback:

yeroc commented 8 months ago

Hope this doesn't come across as overly negative. It's pretty awesome to be able to spin up a product suite supporting metrics, traces and logs with a single command!

grcevski commented 8 months ago

I think this might have to do with the collection names. Can you please double check that the generated collection names are matching, there's this temporary duality with adding/not-adding the unit, when the OTEL metrics are converted to Prometheus.

yeroc commented 8 months ago

@grcevski Thanks for responding. Is "collection name" a Grafana or Prometheus term? I tried Googling but I'm failing to turn up exactly what you're referring to here. Is that equivalent to the metric name? Or something else?

grcevski commented 8 months ago

Sorry I mean the Prometheus series name. I apologize for the confusion.

yeroc commented 8 months ago

@grcevski If I'm understanding correctly, here's the list of metric series names populated:

"http_client_request_duration_seconds",
"http_server_request_duration_seconds",
"jvm_class_count",
"jvm_class_loaded_total",
"jvm_class_unloaded_total",
"jvm_cpu_count",
"jvm_cpu_recent_utilization_ratio",
"jvm_cpu_time_seconds_total",
"jvm_gc_duration_seconds",
"jvm_memory_committed_bytes",
"jvm_memory_limit_bytes",
"jvm_memory_used_after_last_gc_bytes",
"jvm_memory_used_bytes",
"jvm_thread_count",
"otelcol_exporter_queue_capacity",
"otelcol_exporter_queue_size",
"otelcol_exporter_send_failed_log_records_total",
"otelcol_exporter_send_failed_metric_points_total",
"otelcol_exporter_send_failed_spans_total",
"otelcol_exporter_sent_log_records_total",
"otelcol_exporter_sent_metric_points_total",
"otelcol_exporter_sent_spans_total",
"otelcol_http_server_duration_bucket",
"otelcol_http_server_duration_count",
"otelcol_http_server_duration_sum",
"otelcol_http_server_request_content_length_total",
"otelcol_http_server_response_content_length_total",
"otelcol_process_cpu_seconds_total",
"otelcol_process_memory_rss",
"otelcol_process_runtime_alloc_bytes_total",
"otelcol_process_runtime_heap_alloc_bytes",
"otelcol_process_runtime_total_sys_memory_bytes",
"otelcol_process_uptime_total",
"otelcol_processor_batch_batch_send_size_bucket",
"otelcol_processor_batch_batch_send_size_count",
"otelcol_processor_batch_batch_send_size_sum",
"otelcol_processor_batch_metadata_cardinality",
"otelcol_processor_batch_timeout_trigger_send_total",
"otelcol_receiver_accepted_log_records_total",
"otelcol_receiver_accepted_metric_points_total",
"otelcol_receiver_accepted_spans_total",
"otelcol_receiver_refused_log_records_total",
"otelcol_receiver_refused_metric_points_total",
"otelcol_receiver_refused_spans_total",
"otlp_exporter_exported_total",
"otlp_exporter_seen_total",
"processedLogs_total",
"processedSpans_total",
"queueSize_ratio",
"scrape_duration_seconds",
"scrape_samples_post_metric_relabeling",
"scrape_samples_scraped",
"scrape_series_added",
"target_info",
"traces_service_graph_request_client_seconds_bucket",
"traces_service_graph_request_client_seconds_count",
"traces_service_graph_request_client_seconds_sum",
"traces_service_graph_request_server_seconds_bucket",
"traces_service_graph_request_server_seconds_count",
"traces_service_graph_request_server_seconds_sum",
"traces_service_graph_request_total",
"up"

Not sure what I should be matching these up against? Is this related to this Grafana blog post and this OpenTelemetry Collector document that mentions Prometheus Normalization?

grcevski commented 8 months ago

Hm, interesting, you don't see http_server_request_duration_seconds_bucket and http_server_request_duration_seconds_count?

Did you install the docker/grafana-dashboard-red-metrics-classic.json or docker/grafana-dashboard-red-metrics-native.json? Based on the metric series names I think you need to use docker/grafana-dashboard-red-metrics-native.json.

yeroc commented 8 months ago

@grcevski I'm using the docker container as published to Docker Hub via docker run -p 3000:3000 -p 4317:4317 -p 4318:4318 --rm -ti grafana/otel-lgtm as per the Grafana Labs blog post. I haven't cloned this repo and customized anything, thus my comment above about the confusion between the two different predefined RED dashboards that are visible in the Grafana UI. All three dashboards show No Data.

grcevski commented 8 months ago

Ah I see, we should possibly expand the documentation to include mention of the other dashboards. Are there any other dashboards available, we need to use the 'Native Prometheus Dashboard' for the metrics series names you have.

yeroc commented 8 months ago

@grcevski Which other dashboards are you referring to? I see three dashboards by default: image

It looks like these correspond to the dashboard definitions in the docker/grafana-dashboard-*.json files in this repo. Are there others?

grcevski commented 8 months ago

OK, great, so the "RED Metrics (native histogram)" should work if you have data in "http_server_request_duration_seconds". Is it also empty?

yeroc commented 8 months ago

@grcevski I seem to have data: image

but nothing shows up on the dashboard: image

zeitlinger commented 7 months ago

I'm trying to understand where the docs can be improved.

I've just followed the proposed steps for native histograms - please let me know where it didn't work:

  1. start LGTM
  2. go to native dashboard
  3. see instructions (screenshot)
  4. adjust run.sh, uncommenting export OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION=base2_exponential_bucket_histogram
  5. run run-example.sh and generate-traffic.sh
  6. native dashboard shows data (second screenshot)

image

image

yeroc commented 7 months ago

@zeitlinger Sorry, if the intent is for this container to only work with the sample you provided you can go ahead and close this ticket. I'm feeding data in from my own application via the OpenTelemetry Java Agent. I still think it's confusing to have two RED dashboards but maybe that makes sense to OTel experts?

zeitlinger commented 7 months ago

I still think it's confusing to have two RED dashboards

Only one of the RED dashboards can work, depending on how you send the data (controlled by OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION).

I'm happy to improve the docs if you have a suggestion :smile:

yeroc commented 7 months ago

@zeitlinger For me none of the three dashboards are working with Java Agent 2.2.0 per notes above. Not sure what I'm doing wrong. I'd suggest adding docs for whatever is required for the JVM Dashboard to show information.

zeitlinger commented 7 months ago

@zeitlinger For me none of the three dashboards are working with Java Agent 2.2.0 per notes above.

Are you using the included example app or your own? If the latter, can you point to a repo - or steps how to reproduce?

yeroc commented 7 months ago

@zeitlinger My own application. The application isn't public so can't point you to a repo. I included the Java Agent config in ticket summary. Let me know what additional details you'd need. Like I said, even the JVM dashboard doesn't display anything but when I explore Metrics, Logs and Traces I do see information so I know the agent is properly activated and feeding data over.

zeitlinger commented 7 months ago

@zeitlinger My own application.

Can you try to reproduce the issue with the java app?

zeitlinger commented 2 months ago

closing as stale