open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.72k stars 792 forks source link

ValueType.INT metric instrument integer overflow #3014

Open legendecas opened 2 years ago

legendecas commented 2 years ago

JavaScript integer numbers are coerced to floating-point values implicitly if the value goes over the Number.MAX_SAFE_INTEGER. A metric instrument with ValueType.INT should stick to integer values if possible.

Spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#integer

dyladan commented 2 years ago

JS can only safely represent numbers between -(2^(53) - 1) and 2^(53) - 1. Outside this range, adding and subtracting integers will have an imprecise effect on the value. Small additions and subtractions may even have no effect at all.

Options:

  1. Overflow at 2^53 to -2^53 - 1 (stay within JS safe range)
  2. Reset at 2^53 to zero
  3. Overflow at 2^31 to -(2^31 - 1) (emulate 32-bit signed int)

The proto defines numeric integer metric values as signed 64-bit ints (sfixed64 in proto3). I think option 1 is most likely to be incorrectly handled by a backend (may be assumed to be a 64-bit int once it goes above 2^53 - 1). Option 2 is likely to be detected by most backends as a reset and I think should be pretty safe. Option 3 may be correctly detected as an overflow by some backends if they have overflow detection support and may be detected as a reset by others. I'm not sure which backends would properly handle a reset to a negative number.

Personally I think option (2) is the safest, but (3) is also reasonable.

dyladan commented 2 years ago

We can also use BigInt which is supported in major browsers and node since 10.5

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt https://node.green/#ES2020-features-BigInt https://caniuse.com/bigint

When converting to proto we would need to be sure to properly overflow numbers larger than 2^64 - 1 using BigInt.asIntN(64, value).

vmarchaud commented 2 years ago

We can also use BigInt which is supported in major browsers and node since 10.5

I'm not sure if its related but we had an issue to support int64 here: https://github.com/open-telemetry/opentelemetry-js/issues/104, maybe can we tackle both a the same time ? Just serializing it in proto differently depending on the value ?

dyladan commented 2 years ago

We can also use BigInt which is supported in major browsers and node since 10.5

I'm not sure if its related but we had an issue to support int64 here: #104, maybe can we tackle both a the same time ? Just serializing it in proto differently depending on the value ?

Now that we don't have 8,10,12 we can use BigInt there too

dyladan commented 2 years ago

How do we want to handle this? I don't want to just use BigInt for everything since numbers larger than 2^53 aren't the most common and there is a significant performance hit vs using number because BigInt can't take advantage of built in hardware arithmetic.

Here is my proposal:

/cc @open-telemetry/javascript-maintainers especially @legendecas

Int Gauge

Accept number or BigInt in the API. Do not do any conversions, and exporters should handle both cases. No arithmetic is needed in gauges and the user should expect to get out the precision they put in.

Int Sum

Accept number or BigInt in the API. If the user provides BigInt store internal representation as BigInt. If the user provides number then continue to use number until reaching 2^53 and convert to BigInt. Once the internal representation has been converted to BigInt never convert back to number. Above 2^63 (maximum signed 64-bit int) wrap to -2^63.

Histograms and float gauge/sum

No need for special handling as the API for histograms specifies float input. Bucket counts should never reach above MAX_SAFE_INTEGER.

dyladan commented 2 years ago

Once we agree on a strategy i'm happy to work on this one

legendecas commented 2 years ago

I would be inclined not to introduce implicit conversion between numbers and bigints. Conversion from numbers to bigints is easy, but the interoperability between numbers and bigints is not trivial as all the operands must be a bigint. Lots of places must be aware of this implication and it can complicate the common path:

I'd suggest introducing a series of APIs explicitly involved with the bigints, like meter.createBigIntCounter.

legendecas commented 2 years ago

The proto defines numeric integer metric values as signed 64-bit ints (sfixed64 in proto3). I think option 1 is most likely to be incorrectly handled by a backend (may be assumed to be a 64-bit int once it goes above 2^53 - 1).

@dyladan Would you mind clarifying why backends that support signed 64-bit ints can not handle resets from 2^53 - 1 to -(2^53 - 1)?

Either way, as the overflow has to be manually reset, resetting 2^53 - 1 to -(2^53 - 1) or 0 both sound good to me. Overflowing to the lower bound of the safe integer range might be a common choice for the other language implementations.

vmarchaud commented 2 years ago

I'd suggest introducing a series of APIs explicitly involved with the bigints, like meter.createBigIntCounter.

Generally other languages have dedicated method for each type so i would be in favor to have different methods too

dyladan commented 2 years ago

The proto defines numeric integer metric values as signed 64-bit ints (sfixed64 in proto3). I think option 1 is most likely to be incorrectly handled by a backend (may be assumed to be a 64-bit int once it goes above 2^53 - 1).

@dyladan Would you mind clarifying why backends that support signed 64-bit ints can not handle resets from 2^53 - 1 to -(2^53 - 1)?

It may be interpreted as a 64 bit overflow not a 53 bit overflow as 53 bits is a weird time for integers to overflow.

I'd suggest introducing a series of APIs explicitly involved with the bigints, like meter.createBigIntCounter.

I like this idea. We actually don't need to overflow the regular counter at all if we do this because we can simply document that any int instrument that goes larger than Number.MAX_SAFE_INTEGER should use the bigint version

dyladan commented 2 years ago

Would it be better to have a separate method or to have a ValueType.BIGINT?

Option 1

meter.createBigIntCounter("my_big_int_counter", { description, unit });

Option 2

meter.createCounter("my_big_int_counter", { description, unit, valueType: ValueType.BIG_INT });
vmarchaud commented 2 years ago

I wonder which one would be preferable for for platform that doesn't support bigint ? specially old browser version or edge platform that use old engines ?

I personally prefer option 1 though

dyladan commented 2 years ago

I wonder which one would be preferable for for platform that doesn't support bigint

I guess it depends what you want your fallback to look like. Would option 1 fail or return a regular int counter on platforms with no BigInt? If it returns a regular int counter then we are back to the complexity of an API that has to support multiple input types in order to support the fallback behavior.

For both options, we need to decide if we're going to allow BigInt as an input, or just use it as the internal representation. For me, option 1 means we have to allow it as an input. Option 2 keeps the same number-based API, but just changes the internal representation which makes the fallback easy (just use number). Using the number-based API, you would not be able to safely do a single add of more than 2^53, but that seems like a minor issue.

vmarchaud commented 2 years ago

I guess it depends what you want your fallback to look like. Would option 1 fail or return a regular int counter on platforms with no BigInt? If it returns a regular int counter then we are back to the complexity of an API that has to support multiple input types in order to support the fallback behavior.

I would have suggested to make new functions for bigint that are optional in the API but its not great as UX

legendecas commented 2 years ago

Would option 1 fail or return a regular int counter on platforms with no BigInt?

I would prefer a more explicit and fail-early approach for feature absence.

For both options, we need to decide if we're going to allow BigInt as an input, or just use it as the internal representation. For me, option 1 means we have to allow it as an input.

For both options, my idea is that their internal representation is what their name/flag suggests. I don't have a strong opinion on allowing numbers on bigint counters yet, but I'd find a naive start with no implicit conversion would be great.

dyladan commented 2 years ago

Would option 1 fail or return a regular int counter on platforms with no BigInt?

I would prefer a more explicit and fail-early approach for feature absence.

So just use BigInt and it might throw. User is responsible for doing any feature detection?

legendecas commented 2 years ago

Would option 1 fail or return a regular int counter on platforms with no BigInt?

I would prefer a more explicit and fail-early approach for feature absence.

So just use BigInt and it might throw. User is responsible for doing any feature detection?

Yeah, as we don't provide any polyfills, I don't think we can do many things without BigInt support.

aabmass commented 2 years ago

I did a little digging because I was curious how you would pass BigInt to the protocol buffers implementation. It looks like we are using protobuf.js implementation and the typescript stubs it generates look like this:

/** Properties of a NumberDataPoint. */
interface INumberDataPoint {
    // ....
    /** NumberDataPoint asDouble */
    asDouble?: (number|null);

    /** NumberDataPoint asInt */
    asInt?: (number|Long|null);
    // ....
}

the asInt variant accepts Long, which after doing a little digging seems to be from long.js, which is being pulled in as one of protobuf.js dependencies anyway.

  1. Am I missing some way that protobuf.js already accepts BigInts? I couldn't verify if you can just pass a string. For OTLP JSON, I know you should use a decimal string.
  2. I'm wondering if we've considered using Long instead of BigInt? Since we are limited to 64 bits in OTLP, we don't really need BigInt beyond that. Not sure how these two compare performance wise, but having BigInt as a builtin is certainly a pro. OTOH, long.js will be pulled in for anyone using the OTLP protobuf exporter anyway.
dyladan commented 2 years ago

which after doing a little digging seems to be from long.js

Yes that is what that is from.

, which is being pulled in as one of protobuf.js dependencies anyway.

They actually aren't bundling long.js. You can look at the bundler and see that it is external which means it isn't included. I was avoiding using long.js because I didn't want to include another dependency (and not a particularly small one).

  1. Am I missing some way that protobuf.js already accepts BigInts? I couldn't verify if you can just pass a string. For OTLP JSON, I know you should use a decimal string.

You can pass a string but it is converted to a number (or Long if you have included that dependency) for internal representation. There is also a PR open to add BigInt support to protobuf.js https://github.com/protobufjs/protobuf.js/pull/1557

  1. I'm wondering if we've considered using Long instead of BigInt? Since we are limited to 64 bits in OTLP, we don't really need BigInt beyond that. Not sure how these two compare performance wise, but having BigInt as a builtin is certainly a pro.

I did consider it, but decided that I would rather use a real builtin and wait for proper support. Using an external module for 64-bit support means we have to support it for a very long time when the JS ecosystem is clearly moving in another direction.

OTOH, long.js will be pulled in for anyone using the OTLP protobuf exporter anyway.

See above comments about bundler but this actually isn't true.

I was going to convert BigInts to string before providing them to the protobufjs and let protobufjs interpret them in whatever way it knows how. This would allow the end-user to decide if they wanted to use the Long.js package. For any user that didn't use Long.js, they would still get accurate internal representation within OTel with proper overflow, and just have a lossy output until protobufjs implements BigInt support. When protobufjs does update to support BigInts we would remove our string conversion.

dyladan commented 2 years ago

Would option 1 fail or return a regular int counter on platforms with no BigInt?

I would prefer a more explicit and fail-early approach for feature absence.

So just use BigInt and it might throw. User is responsible for doing any feature detection?

Yeah, as we don't provide any polyfills, I don't think we can do many things without BigInt support.

We wouldn't really need to provide polyfills. We could just accept number | BigInt for the counter APIs and do a simple typeof inp === 'number' check. If that check fails we assume we have a BigInt. For our own internal representation we would do a feature detection and fall back on number if BigInt was not available.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

ryanleecode commented 2 months ago

Whats the solution?