Closed gas1121 closed 3 years ago
I have ran into this issue a number of times and tried a lot of different ways to fix it, yet I have been unsuccessful to varying degrees. I think the root cause of this is that Apache Kafka does not always register the first message produced to the topic, and subsequently the consumer does not receive the message either.
The whole point of the online test is to make sure that the kafka-monitor is actually hooked into a kafka topic, and push simulated data through it instead of mocking it. As of late the existing build job has not failed in a while on both the master
and dev
branches (it runs on a cron regardless of pushing code up), so perhaps an improved test is in order to make sure 1+ message is sent or something.
Yes, it's a weird problem. After some tests, I find the problem can be reproduced if I add time.sleep(10)
at the beginning of function TestKafkaMonitor.test_run and then the two docker jobs are always failed. So I think maybe it's because the key is deleted when the this test is running? But I still can not figure out what exactly the problem is.
I have tried a number of scenarios but basically the test comes down to the following:
1) Kafka consumer is created and hooks into the topic, asserting its offset (either at 0 for a new topic, or at X for an existing topic with messages)
2) The feed
unit test runs and ensures the producer works as expected, and pushes a new message into the topic.
3) The run
unit test then ensures the result of the feed
is consumed and put into Redis
Note that we can't always restart at 0 and consume 1 message, because running the test multiple times doesn't indicate it got the newest message we pushed out. So in theory, there are two points of failure, 1 for kafka and 1 for redis. Given that this is an integration test I would like to ensure both are working, but perhaps an expansion of the online test suite is needed to we can better diagnose the failures.
In most cases, rerunning the test (or restarting the Travis Job) solves the issue, and given that it is so inconsistent it makes it difficult to debug.
Forgot about this ticket, fix is in referenced git commit.
Hello, when I run travis on my own fork, the job sometimes fail on kafka monitor with
I ran it on dev branch and didn't change any code, only remove slack notification and docker image push job in travis build.