prometheus / pushgateway

Push acceptor for ephemeral and batch jobs.
Apache License 2.0
2.99k stars 466 forks source link

Flag --push.disable-consistency-check can lose unrelated errors #336

Closed pdxkzeerkti closed 4 years ago

pdxkzeerkti commented 4 years ago

Bug Report

What did you do?

I am working on code in a handler/push.go in an experimental dead-end fork. In the experiment, I forced the check flag to false, e.g.

. . . yada yada . . . 
) func(http.ResponseWriter, *http.Request, httprouter.Params) {
    var ps httprouter.Params
    var mtx sync.Mutex // Protects ps.
    check = false // ADDED LINE

This is equivalent to passing --push.disable-consistency-check on the command line.

I found that the unit tests failed. Some of the failures are understandable since the unit tests are not designed to run with the check flag set, but one failure points to a deeper issue.

What did you expect to see?

I expected to see any unit test that expected a 200 status fail because when the check flag is set, 202 is always returned instead of 200 as described in the documentation.

What did you see instead? Under which circumstances?

In addition to the failures mentioned above, I see one case where the unit tests expect a 400 status code but receive a 202. The failing test is at handler/handler_test.go:218.

I think what is happening is that the code which tries to override the 200 response with a 202 inside the !check condition at line 118 in my version is clobbering a non-success (400 BAD REQUEST) response that was set earlier and for reasons completely unrelated to type checking.

Environment

Vanilla Go build environment on a Mac

$ uname -srm
Darwin 17.7.0 x86_64

Unit test

$ GO111MODULE=on go test -race  -mod=vendor ./...
?       github.com/prometheus/pushgateway   [no test files]
?       github.com/prometheus/pushgateway/asset [no test files]
--- FAIL: TestPush (0.00s)
    handler_test.go:218: Wanted status code 400, got 202.
2020/03/12 17:31:29 requesting/missing.js
2020/03/12 17:31:29 requesting/index.js
2020/03/12 17:31:29 requesting/index.js
2020/03/12 17:31:29 requesting/missing.js
FAIL
FAIL    github.com/prometheus/pushgateway/handler   0.066s
ok      github.com/prometheus/pushgateway/storage   1.118s
FAIL
pdxkzeerkti commented 4 years ago

Correction on that log at the end - I mistakenly copied that from a later iteration when I had made some additional changes. If all you do is disable check and then build with the Makefile, here's what you see:

>> running all tests
GO111MODULE=on go test -race  -mod=vendor ./...
?       github.com/prometheus/pushgateway   [no test files]
?       github.com/prometheus/pushgateway/asset [no test files]
--- FAIL: TestPush (0.00s)
    handler_test.go:128: Wanted status code 200, got 202.
    handler_test.go:182: Wanted status code 200, got 202.
    handler_test.go:218: Wanted status code 400, got 202.
    handler_test.go:255: Wanted status code 200, got 202.
    handler_test.go:292: Wanted status code 200, got 202.
    handler_test.go:332: Wanted status code 200, got 202.
    handler_test.go:390: Wanted status code 200, got 202.
2020/03/12 17:24:48 requesting/missing.js
2020/03/12 17:24:48 requesting/index.js
2020/03/12 17:24:48 requesting/index.js
2020/03/12 17:24:48 requesting/missing.js
FAIL
FAIL    github.com/prometheus/pushgateway/handler   0.073s
ok      github.com/prometheus/pushgateway/storage   (cached)
FAIL
make: *** [common-test] Error 1

The failures that say Wanted status code 200, got 202 are expected as explained in the text above. The one that says handler_test.go:218: Wanted status code 400, got 202 is the case that is concerning.

beorn7 commented 4 years ago

handler_test.go:218: Wanted status code 400, got 202.

That's expected. This test has a mock storage that always returns errors when you try to store anything in it, but with the disabled check, that's not noted during push (which is the point of disabling the check).

pdxkzeerkti commented 4 years ago

OK. So just to push back one time, you're confident there's no chance of a non-200 result being set on the response before that point in the code is reached?

beorn7 commented 4 years ago

There are ways of triggering a 404 (using a path that doesn't exist). There are ways of triggering a 400 (if the payload cannot be decoded). But once the decoded payload is asynchronously submitted to the storage with SubmitWriteRequest, a 202 is served, and no later error will change that.

pdxkzeerkti commented 4 years ago

OK, understand. Thanks for your quick response.