Open jack-berg opened 2 weeks ago
I'm kind of used to timeout 0 meaning "never timeout", e.g.
https://linux.die.net/man/7/socket
If the timeout is set to zero (the default) then the operation will never timeout.
That said, I'm not sure there's a practical use case for supporting "never timeout" in OpenTelemetry pipelines(?)
I could see "retry forever" when retrying from disk, but I think that would be a "retry setting" as opposed to a "timeout setting".
@open-telemetry/technical-committee PTAL
I'm kind of used to timeout 0 meaning "never timeout"
This is how Go interprets a value of zero for many (maybe all?) of these timeouts.
In .NET -1
is often used to define "never timeout" (sic!).
References:
If we can't agree on semantics of 0 vs -1 for "never timeout", here is an alternate: don't support it.
The point of "never timeout" logic is that typically you handle the timeouts yourself and have some other means to interrupt whatever operation is waiting on that "forever timeout". "Never timeout" does not mean wait until the heat death of the universe.
Given the above, is there a practical application of "never timeout" values for any of the Otel config settings? AFAIK, we don't provide any other means to interrupt it, so what's the point of waiting forever (until end of the process, I assume)?
For the hypothetical unknown use case where an actual forever timeout is needed I am guessing the largest acceptable number in milliseconds is enough (assuming 32bits signed, you get 50 years. I want to see a process that runs longer than that).
I do not say that using -1
for "never timeout" is good.
Maybe OTel .NET would agree to handle 0
as "never timeout". However, it may be considered as a breaking change.
CC @open-telemetry/dotnet-maintainers
However, I do agree with @tigrannajaryan that we should have a real use case when such setting is needed. Supporting unbounded values can be considered unsafe.
+1 to change the spec to require positive values only
Related to: https://github.com/open-telemetry/opentelemetry-java/pull/6850
The env var configuration interface defines a variety of options related to exporter timeout settings:
OTEL_EXPORTER_OTLP_TIMEOUT
OTEL_EXPORTER_ZIPKIN_TIMEOUT
OTEL_BSP_EXPORT_TIMEOUT
OTEL_BLRP_EXPORT_TIMEOUT
OTEL_METRIC_EXPORT_TIMEOUT
While the spec states that:
Implying that zero is a valid duration in addition to positive values. However, there is no explicit mention of zero, and how to interpret it.
Does zero mean indefinite? If so, that's a really important piece of information for implementers. Does zero actually mean zero, and represent a degenerate case which is valid but always ends up with timeout? This seems useless.
I think the more likely case is that the spec overlooked zero for exporter timeouts.
I propose we clarify that zero is an invalid duration for all exporter timeout settings, and characterize this as a bugfix. Any language implementation which doesn't validate exporter timeout at all, or accepts zero as valid can fix their implementation as a bugfix. If a language implementation is insistent that switching from no validation to validating that timeout is positive is a breaking change (I disagree but this is a hypothetical), then they can continue without adding the new validation. This would be similar to how we changed the default protocol from
grpc
tohttp/protobuf
, carving out an exception for backwards compatibility.