nats-io / nats-kafka

NATS to Kafka Bridging
Apache License 2.0
131 stars 32 forks source link

Fix false positive tests #23

Closed variadico closed 3 years ago

variadico commented 3 years ago

Currently, the makefile ignores Go test errors because it wants to always run clean up. This change refactors the makefile to allow Go test errors to be noticed and ensure that clean up runs.

In addition, it enables GODEBUG=x509ignoreCN=0 during tests to give us some time to figure out how to generate certs with SANs. This is temporary and should be removed soon.

variadico commented 3 years ago

Also, @ColinSullivan1 brought up about users not using SANs. https://github.com/nats-io/nats-kafka/pull/22#discussion_r557040619

This might be problematic with certs (as you saw with the SAN) - we know some of our users don't use SANs in their certs.

Should our CI only test for Go 1.14? We can temporarily revert the deprecated behavior in Go 1.15 withGODEBUG (which is what I'm doing in this PR). If we release nats-kafka under Go 1.15, users will have to set GODEBUG in their environment if they don't want to use SANs.

wallyqs commented 3 years ago

GODEBUG for testing is fine, though I think that maybe for the next release we still use Go 1.14 (the impact of Go 1.15 is that when nats-kafka tries to connect to a server then it will be more strict with the certs that are used). You can find how these were generated for the nats-server here: https://github.com/nats-io/nats-server/blob/a9a6bdc04fc3283b2aeef37e643ba3db52069e89/test/tls_test.go#L1368-L1375

ColinSullivan1 commented 3 years ago

1.15 for testing is OK imo, but release should be 1.14 imo. I think release binaries built from 1.15 would create issues for too many users due to the SAN requirement.

variadico commented 3 years ago

Okay, so we have some flaky tests. CI is really good at exposing them. When I run them enough times, they fail locally on my Linux laptop as well.

For now, I've disabled the flaky tests in make test, which is what CI uses. However, make test-failfast will run everything, including flakes, and will stop running after the first failure. Overall, we'll actually be alerted of failures now, so that's still good.

I'm gonna handle the config issue in a different PR.

Also, as for fixing the flakes, I'm also working on a PR to swap out segmentio/kafka-go for Shopify/sarama because the former has some bugs that cause unwanted batching. I'm hoping that will stabilize the flakes, otherwise I'll take another look.