istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 324 forks source link

Travis build sometimes fails #127

Closed gas1121 closed 3 years ago

gas1121 commented 7 years ago

Hello, when I run travis on my own fork, the job sometimes fail on kafka monitor with

Traceback (most recent call last):
  File "tests/online.py", line 74, in test_run
    self.assertTrue(self.redis_conn.exists("cluster:test"))
AssertionError: False is not true

I ran it on dev branch and didn't change any code, only remove slack notification and docker image push job in travis build.

madisonb commented 7 years ago

I have ran into this issue a number of times and tried a lot of different ways to fix it, yet I have been unsuccessful to varying degrees. I think the root cause of this is that Apache Kafka does not always register the first message produced to the topic, and subsequently the consumer does not receive the message either.

The whole point of the online test is to make sure that the kafka-monitor is actually hooked into a kafka topic, and push simulated data through it instead of mocking it. As of late the existing build job has not failed in a while on both the master and dev branches (it runs on a cron regardless of pushing code up), so perhaps an improved test is in order to make sure 1+ message is sent or something.

gas1121 commented 7 years ago

Yes, it's a weird problem. After some tests, I find the problem can be reproduced if I add time.sleep(10) at the beginning of function TestKafkaMonitor.test_run and then the two docker jobs are always failed. So I think maybe it's because the key is deleted when the this test is running? But I still can not figure out what exactly the problem is.

madisonb commented 7 years ago

I have tried a number of scenarios but basically the test comes down to the following:

1) Kafka consumer is created and hooks into the topic, asserting its offset (either at 0 for a new topic, or at X for an existing topic with messages) 2) The feed unit test runs and ensures the producer works as expected, and pushes a new message into the topic. 3) The run unit test then ensures the result of the feed is consumed and put into Redis

Note that we can't always restart at 0 and consume 1 message, because running the test multiple times doesn't indicate it got the newest message we pushed out. So in theory, there are two points of failure, 1 for kafka and 1 for redis. Given that this is an integration test I would like to ensure both are working, but perhaps an expansion of the online test suite is needed to we can better diagnose the failures.

In most cases, rerunning the test (or restarting the Travis Job) solves the issue, and given that it is so inconsistent it makes it difficult to debug.

madisonb commented 3 years ago

Forgot about this ticket, fix is in referenced git commit.