open-telemetry / opamp-go

OpAMP protocol implementation in Go
Apache License 2.0
140 stars 69 forks source link

Amend Retry-After parsing logic for negative durations #283

Closed tpaschalis closed 2 months ago

tpaschalis commented 3 months ago

The initial implementation of this PR attempted to fix the TestExtractRetryAfterHeaderHttpDate test by making sure the test duration was in the future. After the comment by Tigran I've pivoted to fix the parsing logic instead.

This PR amends the parseHTTPDate and parseDelaySeconds helpers to not return an error if the parsed duration is in the past; instead they now return a zero duration and nil error to trigger immediate polling.

In the current logic and the two places (1, 2) it's being used, it looks that in the case that a negative seconds or a date in the past was set in the header, both the HTTP and WebSocket clients would reuse the last valid calculated interval, while now it would correctly poll right away.

Old PR description

This PR fixes a timing issue with TestExtractRetryAfterHeaderHttpDate that makes it flaky.

To see the failure, run the following command a couple of times

$ go test -run TestExtractRetryAfterHeaderHttpDate -count=20000 -shuffle=on ./...

The failure itself has to do with the following snippet; technically we're only adding >=0 seconds of delay. If you're lucky enough to add 0s as the duration, then the time.Until condition hits and the header cannot be parsed properly

https://github.com/open-telemetry/opamp-go/blob/1a9603e3d2ded1c7a63917e6d13c4d6e22a7af6d/internal/retryafter_test.go#L61-L62

https://github.com/open-telemetry/opamp-go/blob/ba3b5ff1db417943eed2b004aad6301494a0a4ea/internal/retryafter.go#L35-L40

Fixes #181

tigrannajaryan commented 3 months ago

Hmm, I am not sure the test is wrong. I wonder if the code is wrong. Why do we need the date returned in Retry-After to be in the future? I would argue that is not necessary. If the duration is exactly 0 or even negative that's fine, we should return 0 duration from parseHTTPDate and nil error. This will be overridden by callers since we have a floor of the duration, but that's not this functions concern.

I think we should fix parseHTTPDate to return 0 duration if the calculated duration is <=0.

@open-telemetry/opamp-go-approvers what do you think?

tpaschalis commented 3 months ago

Thanks for the comment; I think you're right. At first I thought there might be some design decision behind this, so I chose to not challenge it.

After looking at the two places (1, 2) where the result of parseHTTPDate might be used, a duration of zero would just immediately trigger the timer, so I don't see a drawback in this. (Edit: In the current logic, it looks like returning a header in the past will reuse the last calculated interval in both the HTTP and WS client implementations). The same holds for the parseDelaySeconds function.

I can amend the PR and relevant tests to fix the parsing logic instead to get a feel for this.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 75.82%. Comparing base (1a9603e) to head (dd2a1d6). Report is 4 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #283 +/- ## ========================================== - Coverage 75.96% 75.82% -0.14% ========================================== Files 25 25 Lines 1656 1659 +3 ========================================== Hits 1258 1258 - Misses 291 293 +2 - Partials 107 108 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tpaschalis commented 3 months ago

cc @tigrannajaryan feel free to take another look!