open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.76k stars 890 forks source link

Clarify that exporter timeout settings must be positive #4283

Open jack-berg opened 2 weeks ago

jack-berg commented 2 weeks ago

Related to: https://github.com/open-telemetry/opentelemetry-java/pull/6850

The env var configuration interface defines a variety of options related to exporter timeout settings:

While the spec states that:

Any value that represents a duration, for example a timeout, MUST be an integer representing a number of milliseconds. The value is non-negative - if a negative value is provided, the implementation MUST generate a warning, gracefully ignore the setting and use the default value if it is defined.

Implying that zero is a valid duration in addition to positive values. However, there is no explicit mention of zero, and how to interpret it.

Does zero mean indefinite? If so, that's a really important piece of information for implementers. Does zero actually mean zero, and represent a degenerate case which is valid but always ends up with timeout? This seems useless.

I think the more likely case is that the spec overlooked zero for exporter timeouts.

I propose we clarify that zero is an invalid duration for all exporter timeout settings, and characterize this as a bugfix. Any language implementation which doesn't validate exporter timeout at all, or accepts zero as valid can fix their implementation as a bugfix. If a language implementation is insistent that switching from no validation to validating that timeout is positive is a breaking change (I disagree but this is a hypothetical), then they can continue without adding the new validation. This would be similar to how we changed the default protocol from grpc to http/protobuf, carving out an exception for backwards compatibility.

trask commented 2 weeks ago

I'm kind of used to timeout 0 meaning "never timeout", e.g.

https://linux.die.net/man/7/socket

If the timeout is set to zero (the default) then the operation will never timeout.

That said, I'm not sure there's a practical use case for supporting "never timeout" in OpenTelemetry pipelines(?)

I could see "retry forever" when retrying from disk, but I think that would be a "retry setting" as opposed to a "timeout setting".

svrnm commented 18 hours ago

@open-telemetry/technical-committee PTAL

MrAlias commented 9 hours ago

I'm kind of used to timeout 0 meaning "never timeout"

This is how Go interprets a value of zero for many (maybe all?) of these timeouts.

pellared commented 9 hours ago

In .NET -1 is often used to define "never timeout" (sic!). References:

tigrannajaryan commented 7 hours ago

If we can't agree on semantics of 0 vs -1 for "never timeout", here is an alternate: don't support it.

The point of "never timeout" logic is that typically you handle the timeouts yourself and have some other means to interrupt whatever operation is waiting on that "forever timeout". "Never timeout" does not mean wait until the heat death of the universe.

Given the above, is there a practical application of "never timeout" values for any of the Otel config settings? AFAIK, we don't provide any other means to interrupt it, so what's the point of waiting forever (until end of the process, I assume)?

For the hypothetical unknown use case where an actual forever timeout is needed I am guessing the largest acceptable number in milliseconds is enough (assuming 32bits signed, you get 50 years. I want to see a process that runs longer than that).

pellared commented 6 hours ago

I do not say that using -1 for "never timeout" is good. Maybe OTel .NET would agree to handle 0 as "never timeout". However, it may be considered as a breaking change. CC @open-telemetry/dotnet-maintainers

However, I do agree with @tigrannajaryan that we should have a real use case when such setting is needed. Supporting unbounded values can be considered unsafe.

yurishkuro commented 6 hours ago

+1 to change the spec to require positive values only