istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 324 forks source link

ImportError: No module named online #158

Closed mohit0749 closed 6 years ago

mohit0749 commented 6 years ago

test_feed (main.TestKafkaMonitor) ... ERROR test_run (main.TestKafkaMonitor) ... ERROR

====================================================================== ERROR: test_feed (main.TestKafkaMonitor)

Traceback (most recent call last): File "tests/online.py", line 56, in setUp self.kafka_monitor._load_plugins() File "/root/scrapy-cluster/kafka-monitor/kafka_monitor.py", line 75, in _load_plugins the_class = self._import_class(key) File "/root/scrapy-cluster/kafka-monitor/kafka_monitor.py", line 59, in _import_class m = import(cl[0:d], globals(), locals(), [classname]) ImportError: No module named online

====================================================================== ERROR: test_run (main.TestKafkaMonitor)

Traceback (most recent call last): File "tests/online.py", line 56, in setUp self.kafka_monitor._load_plugins() File "/root/scrapy-cluster/kafka-monitor/kafka_monitor.py", line 75, in _load_plugins the_class = self._import_class(key) File "/root/scrapy-cluster/kafka-monitor/kafka_monitor.py", line 59, in _import_class m = import(cl[0:d], globals(), locals(), [classname]) ImportError: No module named online


Ran 2 tests in 0.600s

madisonb commented 6 years ago

Can you please provide steps to duplicate, and your environment.

mohit0749 commented 6 years ago

I am using ubuntu 14.04 and python 2.7(Anaconda) with all requirements. Kafka and zookeeper installed on the same server. what I did is, just cloned it from GitHub and added the localsettings.py in all three components and run python tests/online-v` in kafka-monitor dir and it throws the same error.

madisonb commented 6 years ago

Can you reproduce this for me in the scrapy-cluster vagrant machine scdev that is included in this repo? Otherwise I would suggest seeing if Anaconda is the problem, and use something like virtualenv to install your packages. This project moved away from anaconda a long time ago because of issues like that.

madisonb commented 6 years ago

Better yet, can you tell me at what step you get stuck on under this link https://scrapy-cluster.readthedocs.io/en/latest/topics/introduction/quickstart.html

mohit0749 commented 6 years ago

anaconda is the main problem.

mohit0749 commented 6 years ago

https://scrapy-cluster.readthedocs.io/en/latest/topics/introduction/quickstart.html#cluster-quickstart @madisonb i get stuck at python tests/online -v in kafka-monitor

mohit0749 commented 6 years ago

@madisonb now i am getting this error

test_feed (__main__.TestKafkaMonitor) ... 2018-01-05 14:20:39,088 [kafka-monitor] DEBUG: Logging to stdout
2018-01-05 14:20:39,090 [kafka-monitor] DEBUG: Creating new kafka consumer using brokers: 0.0.0.0:9092 and topic demo.incoming_test
2018-01-05 14:20:39,225 [kafka-monitor] DEBUG: Successfully connected to Kafka
2018-01-05 14:20:39,226 [kafka-monitor] DEBUG: Trying to load plugin tests.online.CustomHandler
2018-01-05 14:20:39,230 [kafka-monitor] DEBUG: Connected to Redis in ActionHandler
2018-01-05 14:20:39,231 [kafka-monitor] DEBUG: Successfully loaded plugin tests.online.CustomHandler
2018-01-05 14:20:39,233 [kafka-monitor] DEBUG: Connected to Redis in StatsCollector Setup
2018-01-05 14:20:44,234 [kafka-monitor] DEBUG: Creating new kafka producer using brokers: 0.0.0.0:9092
2018-01-05 14:20:44,350 [kafka-monitor] INFO: Feeding JSON into demo.incoming_test
{
    "action": "info", 
    "spiderid": "link", 
    "uuid": "mytestid", 
    "appid": "testapp"
}
2018-01-05 14:20:54,236 [kafka-monitor] INFO: Successfully fed item to Kafka
ok
test_run (__main__.TestKafkaMonitor) ... 2018-01-05 14:20:54,252 [kafka-monitor] DEBUG: Creating new kafka consumer using brokers: 0.0.0.0:9092 and topic demo.incoming_test
2018-01-05 14:20:54,392 [kafka-monitor] DEBUG: Successfully connected to Kafka
2018-01-05 14:20:54,393 [kafka-monitor] DEBUG: Trying to load plugin tests.online.CustomHandler
2018-01-05 14:20:54,395 [kafka-monitor] DEBUG: Connected to Redis in ActionHandler
2018-01-05 14:20:54,395 [kafka-monitor] DEBUG: Successfully loaded plugin tests.online.CustomHandler
2018-01-05 14:20:54,397 [kafka-monitor] DEBUG: Connected to Redis in StatsCollector Setup
FAIL

======================================================================
FAIL: test_run (__main__.TestKafkaMonitor)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/online.py", line 74, in test_run
    self.assertTrue(self.redis_conn.exists("cluster:test"))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 2 tests in 20.320s

FAILED (failures=1)
madisonb commented 6 years ago

So that bug means that your kafka monitor may not be configured correctly, triple check that.

Otherwise that is a really hard bug to reproduce https://github.com/istresearch/scrapy-cluster/issues/127 you shouljd be able to rerun your tests and get it to pass, it appears to be an issue with kafka accepting the very first message to the topic, or something else that is difficult to debug or consistently reproduce.

You can test this out by simply running the cluster and using it like normal, you can see it in the travis build logs for this project, sometimes it just doesnt read the first message.

mohit0749 commented 6 years ago

thank for the help @madisonb , problem is solved now. it was the Kafka and zookeeper problem, i installed ZooKeeper package from Ubuntu's default repositories that is why it was not working then I removed that and used the zookeeper server which is in kafka bin dir and it worked.