scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 216 forks source link

Got error "ERROR:messagebus.stats:TClient instance has no attribute 'get_stats'" #359

Closed kevin-ZZZ closed 5 years ago

kevin-ZZZ commented 5 years ago

I'm new to frontera,when connect hbase,this error ouucred:

when I follow the frontera docs and run command “python -m frontera.worker.db --config config.dbw --no-incoming --partitions 0 1

ERROR:messagebus.stats:TClient instance has no attribute 'get_stats' NoneType: None CRITICAL:messagebus.stats: File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred result = f(*args, **kw) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/frontera/worker/stats.py", line 75, in export_stats stats = self.get_stats() File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/frontera/worker/stats.py", line 95, in get_stats stats.update(self.backend.get_stats() or {}) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/frontera/contrib/backends/hbase/init.py", line 608, in get_stats stats.update(self.connection.client.get_stats()) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/thriftpy/thrift.py", line 184, in getattr self.class.name, _api))

I'm not sure how came this issue,and I got a lot of Error like this,finally the program off because of recursion. version of Hbase : 2.1.0 version of happyBase : 1.1.0 version of thriftpy : 0.3.9

Thanks for help!

sibiryakov commented 5 years ago

Hi @kevin-ZZZ thanks for reporting! This code is using our internal version of Happybase, and get_stats() method wasn't defined in public Happybase. You can fix that by removing this call, appreciate if you could submit a patch for this.

kevin-ZZZ commented 5 years ago

hi,@sibiryakov I'm not quite sure what you mean. get_stats is for redefining/transforming stats data in child class; if just remove it violently,will it cause loss or error in feature(by the time I ask the question ,I did not finish reading all the code)?or can you give me a declaration of get_stats() in your happybase. Thanks for help

sibiryakov commented 5 years ago

It shouldn't cause any errors, because the data returned by get_stats() is send directly to metrics stream/topic, and when absent, there shouldn't be any fails.

sibiryakov commented 5 years ago

https://github.com/scrapinghub/frontera/commit/dfdc396fbd3c73a0bbf7ebc169bf21ab3ce3e3c2

sibiryakov commented 5 years ago

let me know, if there is anything else.

kevin-ZZZ commented 5 years ago

@sibiryakov ,Thanks for the help.

As a new comer to this framework,some problem is really hit me.

command: (frontrea) kevin@ubuntu:~/project/freeProxy$ scrapy crawl kuaidaili -L INFO -s SPIDER_PARTITION_ID=1

2019-01-16 20:21:23 [twisted] CRITICAL: Traceback (most recent call last): File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/scrapy/crawler.py", line 80, in crawl self.engine = self._create_engine() File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/scrapy/crawler.py", line 105, in _createengine return ExecutionEngine(self, lambda : self.stop()) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/scrapy/core/engine.py", line 70, in init self.scraper = Scraper(crawler) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/scrapy/core/scraper.py", line 69, in init self.spidermw = SpiderMiddlewareManager.from_crawler(crawler) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/scrapy/middleware.py", line 58, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/scrapy/middleware.py", line 34, in from_settings mwcls = load_object(clspath) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/site-packages/scrapy/utils/misc.py", line 44, in load_object mod = import_module(module) File "/home/kevin/.virtualenvs/frontrea/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 941, in _find_and_load_unlocked File "", line 219, in _call_with_frames_removed File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 953, in _find_and_load_unlocked ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'

A new problem occur, I believe i've inject seeds through command line: python -m frontera.utils.add_seeds --config [your_frontera_config] --seeds-file [path to your seeds file]

so I'm confused about the error here,Does the seeds module really exist?

sibiryakov commented 5 years ago

no, this module is not needed. Since the latest version seeds have to be injected using strategy worker.

ghost commented 5 years ago

@sibiryakov how to inject the seeds using strategic worker?

Screenshot from 2019-06-08 14-40-23

I found that u said the seeds have to be injected in crawling strategic. Im totally confused.... If possible would you provide a simple example?