micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.46k stars 990 forks source link

BigQuery implementation #2796

Closed lbaeumer closed 10 months ago

lbaeumer commented 3 years ago

Please describe the feature request. It would be great if an implementation for Google BigQuery would be supported.

Rationale BigQuery is the data analytics solution on the Google Cloud Plattform and is a good solution to collect micrometer metrics for GCP cloud native applications. In addition there are a couple of GCP tools available that enable the user to create reports based on BigQuery tables, e.g. https://datastudio.google.com/. More information about BigQuery https://cloud.google.com/bigquery

Additional context The Stackdriver implementation is already available, but stackdriver has major issues if data is send from multiple parallel instances.

lbaeumer commented 3 years ago

If others are interested in BigQuery support I would provide a PR in the next days.

shakuzen commented 3 years ago

The Stackdriver implementation is already available, but stackdriver has major issues if data is send from multiple parallel instances.

What issues are those? I'd rather we try to fix those (if on the Micrometer side) or make sure the Stackdriver team is aware of them, since Stackdriver is the official metrics solution offering for GCP. Surely it's not the intention that Stackdriver cannot handle receiving data from multiple parallel instances.

While I don't doubt that BigQuery could be used, it's not really a metrics solution and I'm not sure it is something we should support in the Micrometer org. It sounds more like something to be community-maintained.

lbaeumer commented 3 years ago

The Stackdriver implementation is already available, but stackdriver has major issues if data is send from multiple parallel instances.

What issues are those? I'd rather we try to fix those (if on the Micrometer side) or make sure the Stackdriver team is aware of them, since Stackdriver is the official metrics solution offering for GCP. Surely it's not the intention that Stackdriver cannot handle receiving data from multiple parallel instances.

While I don't doubt that BigQuery could be used, it's not really a metrics solution and I'm not sure it is something we should support in the Micrometer org. It sounds more like something to be community-maintained.

BigQuery is the google solution to do analytics on huge amounts of data. I think Stackdriver is more meant to visualize infrastructure metrics (e.g. heap, cpu). For monitoring Business KPIs BigQuery is a great solution, because it provides a set of possible visualization implementations including datastudio, Grafana or even programmatic SQL usage. So you could create quite complex queries on large datasets for which stackdriver is not really build for. Therefore BigQuery is a good solution for a more business related usecase. With Stackdriver I faced the following problem: I used micrometer stackdriver with the Google Appengine to publish data to micrometer which worked fine unless only one instance is running. After a 2nd GAE instance was started I had the situation that specific metrics were sent independently by two instances. The problem is, that stackdriver enforces a constraint which does not allow to publish a metric with a faster interval than 10s. But it could happen that multiple instances send metrics in less than 10s. I could add an instance name to the measurement name to avoid this problem, but that would not be a good solution. In addition I had problems with the time, because stackdriver expects the datasets in a timely increasing order. So this is not a micrometer issue, but a stackdriver issue. If this would be helpful for you I try to recreate the stacktrace. Despite of the technical problems I think that BigQuery is a better solution than Stackdriver for analytics and business KPIs. So I would appreciate if I could contribute to micrometer.

lbaeumer commented 3 years ago

Some additional information about the problem I had with stackdriver

com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.: timeSeries[2-6,8,10,12-14,17,22,23,26-32,34-36,39,42,43,45,48,51,53,59,61,63,66-69] at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:49) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60) at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97) at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1074) at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213) at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771) at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563) at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533) at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553) at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57) at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112) at com.google.cloud.monitoring.v3.MetricServiceClient.createTimeSeries(MetricServiceClient.java:1526) at io.micrometer.stackdriver.StackdriverMeterRegistry.publish(StackdriverMeterRegistry.java:167)

micrometer stackdriver: 1.7.4 quarkus: 2.2.3.Final appengine, Java11

cmp. https://stackoverflow.com/questions/58153208/one-or-more-points-were-written-more-frequently-than-the-maximum-sampling-period

shakuzen commented 3 years ago

That exception sounds like #1335. See the recently updated docs that mentions how to avoid that.

lbaeumer commented 3 years ago

I tried the stackdriver implementation. It still shows the error - even with adding a unique tag (key and value). (cmp comment #1335) Despite of my issues I think that BigQuery is a good solution for analytics tasks.

marcingrzejszczak commented 10 months ago

I'd suggest creating an issue in BigQuery to add support for Micrometer. If you create such issue please mention us there.

lbaeumer commented 2 weeks ago

I provided a BigQuery implementation here. https://github.com/lbaeumer/micrometer-bigquery