Open colincadams opened 3 weeks ago
The fix in #3429 might be the culprit. IIRC the previous behavior (see https://github.com/open-telemetry/opentelemetry-python/issues/3407) was that histograms would not be sent from the SDK to the exporter if there had been no observations since the last export.
Does your app have low QPS or low QPS for certain routes?
@aabmass Yes, this is for a quite low traffic application, so that does seem likely to be the root cause
@colincadams what is your export interval? You may be able to achieve similar cost savings by exporting less often
Our export interval is 60s, we could certainly reduce it and that would help with cost savings. Did anything about bucket creation change? It seems like a pretty large increase just for reporting frequency, especially given the cardinality of these metrics should be relatively low, but it's possible that's it.
Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version, platform, version numbers of installed dependencies, information about your cloud hosting provider, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main.
We noticed a very large increase in our GCM cost due to an increase in metrics bytes ingested for our base histogram metrics (e.g.
http.client.duration
). This coincided with an upgrade tov1.23.0
. A subsequent downgrade tov1.22.0
led to a decrease in the bytes ingested and cost increases back to their prior levels.This commit is the revert: https://github.com/Recidiviz/pulse-data/commit/d321a4e30f612e9964f18106ded28d6a0fce250e
Steps to reproduce Describe exactly how to reproduce the error. Include a code sample if applicable.
Upgrade to
v1.23.0
or later (only tested up tov1.24.0
, so it is possible it has been fixed)What is the expected behavior?
No increase in bytes ingested by GCM for histogram metrics.
What is the actual behavior?
Order of magnitude increase in cost.
Additional context
I haven't taken the time to fully understand the changes here, but if this PR led to all of the buckets always being created, and before that was not the case, this could be the culprit: https://github.com/open-telemetry/opentelemetry-python/pull/3429