Closed victlu closed 3 years ago
This would require at least one data point to arrive in order for exporters to create metric descriptors in the backend. What are the use cases where the current data model is not working for you?
I've proposed similar, #172, but at this point I'm happy with what we have.
@victlu the problem with this approach, is that if there are systems that consider the data point kind as part of the identity for a metric will be impossible to have that. For example stackdriver in GCP does that, they consider an int sum different than and double sum, and they need to know if the metric will have one or the other.
@bogdandrutu, In your case, I would think it serves them better to explicitly name their metrics accordingly (as they want two time series) rather than by the data type.
I think this question is related to #1366. How do we consider the "name" to be unique in identifying a time series.
This would require at least one data point to arrive in order for exporters to create metric descriptors in the backend. What are the use cases where the current data model is not working for you?
@rakyll Why can't metric descriptors be created (without data type)? Or are you saying a descriptor MUST include the data type? In which case, that's the argument I'm making here...
IMHO, metric backend systems ultimately shows/graphs/alerts/store/etc... numeric based Time-Series. And for performance and efficiency, they want to use the most efficient representation of the "number". Thus, promotion of data types are common especially for storage considerations. The important thing is that it's just a number regardless if its transmitted as a byte, short, ushort, int, uint, long, ulong, float, double, etc...
Or are you saying a descriptor MUST include the data type? In which case, that's the argument I'm making here...
If the type is included in the metric descriptor/name what is the benefit of doing what you suggested? Also your suggestion will add an extra check possible that you need to verify the type in the descriptor matches the type in the points.
Also on top of everything that adds extra allocations on serialization/deserialization from proto bytes.
@victlu Some backends like Stackdriver see the type as an integral part of the metric. See the value_type at https://cloud.google.com/monitoring/api/ref_v3/rpc/google.api#google.api.MetricDescriptor. This model gives us an easy/cost effective way of identifying the data type in metric creation. These backends will reject if send a floating number to an integer metric, and it'd require validation of the data points in the exporter as @bogdandrutu suggests.
I am trying to understand, but I have a few questions...
From last few responses, it seems we are specializing OTLP protocol based on specific vendor's backend implementation? I assume we would do this for performance? We should call out what backend system we are optimizing for in this case.
What happens if a vendor does not need to separate ints/doubles, then it would need to merge ints and doubles together? That does not seem performant to me either.
I would think the Spec is more "higher-level" to convey the meaning and semantics of metrics. The data type of data points seems more like details.
Looking at a few existing client APIs (i.e. Prometheus, micrometer) , I don't see Counters/Gauge/Histogram/etc... decorated with a data type (i.e. IntCounter/DoubleCounter/etc...). Or they just take double to cover both int/double. What should we do here?
How will this work if I'm using a type-less language (i.e. javascript)? I assume I will need to separate out ints/doubles/etc... into different metrics?
I'm also not suggesting every data point needs to be type-less. I just don't see the data-type being at the topmost level with semantics like Sum/Gauge/Histogram/etc... Maybe we have oneof IntTimeSeries/DoubleTimeSeries under the oneof Sum/Gauge/Histogram/etc.
I propose we change to following...
message Metric {
// ...
oneof data {
Gauge gauge = 4;
Sum sum = 5;
Histogram histogram = 6;
Summary summary = 7;
}
}
message Gauge {
repeated IntDataPoint int_data_points = 1;
repeated DoubleDataPoint double_data_points = 2;
}
message Sum {
// aggregation_temporality describes if the aggregator reports delta changes
// since last report time, or cumulative changes since a fixed start time.
AggregationTemporality aggregation_temporality = 2;
// If "true" means that the sum is monotonic.
bool is_monotonic = 3;
repeated IntDataPoint int_data_points = 4;
repeated DoubleDataPoint double_data_points = 5;
}
message Histogram {
// aggregation_temporality describes if the aggregator reports delta changes
// since last report time, or cumulative changes since a fixed start time.
AggregationTemporality aggregation_temporality = 2;
repeated IntDataPoint int_data_points = 3;
repeated DoubleDataPoint double_data_points = 4;
}
message Summary {
repeated DoubleDataPoint double_data_points = 1;
}
For Summary
see https://github.com/open-telemetry/opentelemetry-proto/pull/269 and for Historgram
see https://github.com/open-telemetry/opentelemetry-proto/pull/270
@victlu I kind of like your proposal, but I would like to understand how do you see this as fixing an issue? I think this is a nice cosmetic improvement, but I am not clear if this solves a real issue or we see this as just an cosmetic improvement :)
Update: @victlu how do we deal with a case where both repeated fields contain values?
@jmacd what do you think about this improvement in readability https://github.com/open-telemetry/opentelemetry-proto/issues/264#issuecomment-789241623 ?
The change I'm trying to affect is to decouple the semantic of an instrument (i.e. Counter vs Gauge vs Histogram) from the details of the data type (bytes/int/long/double/float/etc). I think it may have these beneficial consequences.
Separately, I think we should group by Labels first before we have DataPoints to minimize on buffer size. I'll file a separate issue for this.
@jmacd what do you think about this improvement in readability?
I like it, but it raises the question you asked.
@victlu how do we deal with a case where both repeated fields contain values?
That's a tough one. We discussed in yesterday's data model meeting that SDKs should not generate conflicting types (loosely defined), and we discussed that we would like the collector to pass-through conflicting types (again loosely defined) as if they were separate metrics. This appearance of both double and integer repeated fields makes it possible to have a single Metric that looks "not separate", in the sense that it contains multiple number types. I think if we allow what @victlu proposes, we should firmly state that OTel welcomes you to mix integer and double metrics--that this is not a semantic conflict--just that we would like you not to do so in the same SDK.
Naturally we could use a oneof
to prevent mixing integer and double values. We did a lot of benchmarking last summer to talk ourselves out of oneof
messages as much as possible.
Even when we allow mixing repeated integer- with repeated double-values, it will be difficult to work with this mixture of values without combining and interleaving them (sort by time) into an intermediate representation, which probably brings up the problem discussed next.
Separately, I think we should group by Labels first before we have DataPoints to minimize on buffer size. I'll file a separate issue for this.
Yes, let's keep that separate. I recall a historical conversation, but would prefer not to dig up old threads. @bogdandrutu do have thoughts? I believe we were avoiding additional message layers where possible. The best argument in favor of the current design that I remember is that an SDK should never output more than one point per metric and label set per collection period. So, the extra layer for label set will always have 1 entry in the encoding @victlu proposes, under a simple collection model. I haven't thought through the implications of this in the collector.
I think the greater win would be to move labels and label sets into separate regions in the Export request, with a more-compact encoding.
@victlu When you first mentioned this I was skeptical, but I do like the simplicity proposed here. My only concern right now is the amount of churn this could cause right now. Ideally this should be done via a "deprecation" and "transition" period, as we DO have implementations that folks are trying to use with metrics.
Also to @jmacd and @bogdandrutu IIUC "oneof" is slow in implementation in Go, right? That's not a fundamental limitation in protocol buffers AFAIK, it's basically just the same as "optional" but with some runtime checks.
Like @jsuereth I'm worried about churn. I'd like to connect the question posed here, about reducing the number of types inside the Metric
oneof with the question @bogdandrutu poses about histograms in #259. I propose we adopt the same approach for both of these, whichever it is.
On the one hand, we can have just the four semantically distinct types (Counter, Gauge, Histogram, Summary) and push all their variation into the types themselves. This is Victor's proposal for Gauge/Sum, and this is my proposal for Histogram in #272.
On the other hand, we can go with a distinct type for each variation in the Metric
oneof. Right now, we have a distinct type for IntGauge
and DoubleGauge
, for example, and the question is whether we should keep that approach or not. If we keep that approach, then @bogdandrutu's point in #259 should be addressed and each distinct form of Histogram buckets gets its own type.
Agree w/ @jmacd on consistency between histogram + metric types. If we go with #272 then we should also take this proposal/direction (albeit I'd like to see a smooth transition accompany the propsoal).
@victlu If you want to take a crack at what migration from existing OTLP to this proposal looks like, that'd be good. This will disrupt a lot of the ecosystem in OTel and I know there's already some users of metrics where such a large change could break them if it's not done smoothly. I realize metrics isn't marked "stable" but users are users. If you want help thinking through a smooth-transition plan, I'm happy to take time to brainstorm with you.
I'll take a crack at it. @jsuereth I'll reach out on slack as I am sure I'm not up-to-speed on all the intricacies of this task.
I'd agree with the approach similar to #272 add the new form with simplistic names. I assume there would need to be changes in the OLTP exporter. Should there be a a relate ticket for the "supported" languages? Which of those are there? ** Maybe a config option added to the exporter in order to switch them in the short term, dual write, etc. How long of a transition period would be needed? If there's supposed to be a stable Model https://github.com/open-telemetry/opentelemetry-specification/milestone/16 1 month seems a little tight given a number of the open areas.
Edit: I have created an issue #275 discuss this further since it seems to span a few issues.
Proposal for smooth transition for this (and other) protocol changes.
To deliver OTLP protocol changes, the following proposed steps should be followed to achieve a smooth transition/migration for all affected components.
Apply changes to OTLP protocol
Update and release all dependant components
Wait till after transition date
Repeat as necessary
Mark old messages as "depricated" and add new messages...
oneof data {
// DEPRICATED BY 4/15/2021
IntGauge int_gauge = 4;
DoubleGauge double_gauge = 5;
IntSum int_sum = 6;
DoubleSum double_sum = 7;
IntHistogram int_histogram = 8;
DoubleHistogram double_histogram = 9;
DoubleSummary double_summary = 11;
// New messages
Gauge guage = 12;
Sum sum = 13;
Summary summary = 14;
Histogram histogram = 15;
}
Add new messages...
message Gauge {
repeated IntDataPoint int_data_points = 1;
repeated DoubleDataPoint double_data_points = 2;
}
message Sum {
// aggregation_temporality describes if the aggregator reports delta changes
// since last report time, or cumulative changes since a fixed start time.
AggregationTemporality aggregation_temporality = 2;
// If "true" means that the sum is monotonic.
bool is_monotonic = 3;
repeated IntDataPoint int_data_points = 4;
repeated DoubleDataPoint double_data_points = 5;
}
message Histogram {
// aggregation_temporality describes if the aggregator reports delta changes
// since last report time, or cumulative changes since a fixed start time.
AggregationTemporality aggregation_temporality = 2;
repeated IntDataPoint int_data_points = 3;
repeated DoubleDataPoint double_data_points = 4;
}
message Summary {
repeated DoubleDataPoint double_data_points = 1;
}
@victlu I think the transition is a bit more dramatic here. Remember that OTLP is an in-flight protocol and deprecations have a slow-rollout between producers/consumers, so you need to call that out in the transition. I.e. what you define may be a little too aggressive/churny for end users.
Here's an abridged update to what you had. I think maybe this deserves a full OTEP, but for the purposes of your change we just need to agree to the general outline.
To deliver OTLP protocol changes, the following proposed steps should be followed to achieve a smooth transition/migration for all affected components.
deprecated_
Please review #278.
With #278, my concern has been resolved! Thank you @jmacd .
Let's leave this open until the PR merges 😀
The OTLP message Metric has the following definition for instrument types...
spec
I propose we only define Instrument Types (i.e. Gauge, Sum, etc...) at the message Metric level.
The data type (i.e. int, double, etc...) should be lower level, maybe at the message DataPoint level.
i.e.