scrapinghub / hcf-backend

Crawl Frontier HCF backend
BSD 3-Clause "New" or "Revised" License
7 stars 5 forks source link

AttributeError: 'FrontierManager' object has no attribute 'extra' #1

Closed stav closed 9 years ago

stav commented 9 years ago

I have not investigated this yet:

(diffeo)stav@platu:~/Workspace/sh/Diffeo/diffeo-netsec$ scrapy crawl blackhat
/home/stav/.virtualenvs/diffeo/src/scrapy/scrapy/contrib/linkextractors/sgml.py:107: ScrapyDeprecationWarning: SgmlLinkExtractor is deprecated and will be removed in future releases. Please use scrapy.contrib.linkextractors.LinkExtractor
  ScrapyDeprecationWarning
2015-03-12 12:21:05-0600 [scrapy] INFO: Scrapy 0.25.1 started (bot: netsec)
2015-03-12 12:21:05-0600 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-03-12 12:21:05-0600 [scrapy] INFO: Overridden settings: {'COOKIES_ENABLED': False, 'USER_AGENT': 'Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5', 'MEMUSAGE_REPORT': True, 'DOWNLOAD_DELAY': 4, 'REDIRECT_MAX_TIMES': 3, 'MEMDEBUG_ENABLED': True, 'RETRY_ENABLED': False, 'HTTPCACHE_ENABLED': True, 'CONCURRENT_REQUESTS_PER_IP': 1, 'MEMUSAGE_LIMIT_MB': 512, 'DEPTH_PRIORITY': 10, 'CONCURRENT_REQUESTS': 1, 'DOWNLOAD_WARNSIZE': 5242880, 'SPIDER_MODULES': ['netsec.spiders'], 'BOT_NAME': 'netsec', 'CONCURRENT_ITEMS': 10, 'NEWSPIDER_MODULE': 'netsec.spiders', 'ROBOTSTXT_OBEY': True, 'CONCURRENT_REQUESTS_PER_DOMAIN': 1, 'DOWNLOAD_MAXSIZE': 10485760, 'MEMUSAGE_ENABLED': True, 'SCHEDULER': 'crawlfrontier.contrib.scrapy.schedulers.frontier.CrawlFrontierScheduler', 'MEMUSAGE_WARNING_MB': 400, 'MEMUSAGE_NOTIFY_MAIL': 'steven@scrapinghub.com'}
2015-03-12 12:21:05-0600 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, MemoryUsage, CoreStats, MemoryDebugger, SpiderState
2015-03-12 12:21:05-0600 [scrapy] INFO: Enabled downloader middlewares: NoExternalReferersMiddleware, RobotsTxtCustomMiddleware, RobotsTxtMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, RotateUserAgentMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, DenyDomainsMiddleware, CrawleraMiddleware, BanMiddleware, ChunkedTransferMiddleware, DownloaderStats, HttpCacheMiddleware, SchedulerDownloaderMiddleware
No handlers could be found for logger "streamcorpus"
2015-03-12 12:21:05-0600 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware, ExporterMiddleware, SchedulerSpiderMiddleware
2015-03-12 12:21:05-0600 [scrapy] INFO: Enabled item pipelines:
2015-03-12 12:21:05-0600 [blackhat] DEBUG: DOWNLOADER_MIDDLEWARES_BASE: {'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': 400, 'scrapy.contrib.downloadermiddleware.httpauth.HttpAuthMiddleware': 300, 'scrapy.contrib.downloadermiddleware.cookies.CookiesMiddleware': 700, 'scrapy.contrib.downloadermiddleware.retry.RetryMiddleware': 500, 'scrapy.contrib.downloadermiddleware.chunked.ChunkedTransferMiddleware': 830, 'scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware': 900, 'scrapy.contrib.downloadermiddleware.stats.DownloaderStats': 850, 'scrapy.contrib.downloadermiddleware.httpcompression.HttpCompressionMiddleware': 590, 'scrapy.contrib.downloadermiddleware.redirect.MetaRefreshMiddleware': 580, 'scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware': 600, 'scrapy.contrib.downloadermiddleware.defaultheaders.DefaultHeadersMiddleware': 550, 'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 750, 'scrapy.contrib.downloadermiddleware.robotstxt.RobotsTxtMiddleware': 100, 'scrapy.contrib.downloadermiddleware.ajaxcrawl.AjaxCrawlMiddleware': 560, 'scrapy.contrib.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware': 350}
2015-03-12 12:21:05-0600 [blackhat] DEBUG: DOWNLOADER_MIDDLEWARES: {'netsec.middlewares.BanMiddleware': 800, 'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None, 'netsec.middlewares.RotateUserAgentMiddleware': 400, 'netsec.middlewares.RobotsTxtCustomMiddleware': 90, 'netsec.middlewares.DenyDomainsMiddleware': 620, 'crawlfrontier.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware': 999, 'netsec.middlewares.NoExternalReferersMiddleware': 50, 'scrapylib.crawlera.CrawleraMiddleware': 650}
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware', None)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('netsec.middlewares.NoExternalReferersMiddleware', 50)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('netsec.middlewares.RobotsTxtCustomMiddleware', 90)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.robotstxt.RobotsTxtMiddleware', 100)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.httpauth.HttpAuthMiddleware', 300)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware', 350)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('netsec.middlewares.RotateUserAgentMiddleware', 400)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.retry.RetryMiddleware', 500)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.defaultheaders.DefaultHeadersMiddleware', 550)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.ajaxcrawl.AjaxCrawlMiddleware', 560)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.redirect.MetaRefreshMiddleware', 580)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.httpcompression.HttpCompressionMiddleware', 590)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware', 600)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('netsec.middlewares.DenyDomainsMiddleware', 620)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapylib.crawlera.CrawleraMiddleware', 650)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.cookies.CookiesMiddleware', 700)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware', 750)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('netsec.middlewares.BanMiddleware', 800)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.chunked.ChunkedTransferMiddleware', 830)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.stats.DownloaderStats', 850)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware', 900)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Downloader Middleware: ('crawlfrontier.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware', 999)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: SPIDER_MIDDLEWARES_BASE: {'scrapy.contrib.spidermiddleware.httperror.HttpErrorMiddleware': 50, 'scrapy.contrib.spidermiddleware.referer.RefererMiddleware': 700, 'scrapy.contrib.spidermiddleware.depth.DepthMiddleware': 900, 'scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware': 500, 'scrapy.contrib.spidermiddleware.urllength.UrlLengthMiddleware': 800}
2015-03-12 12:21:05-0600 [blackhat] DEBUG: SPIDER_MIDDLEWARES: {'streamitem.middlewares.ExporterMiddleware': 950, 'crawlfrontier.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 999}
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider Middleware: ('scrapy.contrib.spidermiddleware.httperror.HttpErrorMiddleware', 50)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider Middleware: ('scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware', 500)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider Middleware: ('scrapy.contrib.spidermiddleware.referer.RefererMiddleware', 700)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider Middleware: ('scrapy.contrib.spidermiddleware.urllength.UrlLengthMiddleware', 800)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider Middleware: ('scrapy.contrib.spidermiddleware.depth.DepthMiddleware', 900)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider Middleware: ('streamitem.middlewares.ExporterMiddleware', 950)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider Middleware: ('crawlfrontier.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware', 999)
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider field: custom_settings <type 'dict'> {'DEPTH_LIMIT': 3, 'DOWNLOAD_DELAY': 5}
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider field: start_urls <type 'list'> ['http://blackhat.com/']
2015-03-12 12:21:05-0600 [blackhat] DEBUG: Spider field: target_domains <type 'list'> ['blackhat.com']
2015-03-12 12:21:05-0600 [blackhat] INFO: Spider opened
2015-03-12 12:21:05-0600 [-] ERROR: Unhandled error in Deferred:
2015-03-12 12:21:05-0600 [-] Unhandled Error
    Traceback (most recent call last):
      File "/home/stav/.virtualenvs/diffeo/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1253, in unwindGenerator
        return _inlineCallbacks(None, gen, Deferred())
      File "/home/stav/.virtualenvs/diffeo/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/home/stav/.virtualenvs/diffeo/src/scrapy/scrapy/crawler.py", line 53, in crawl
        yield self.engine.open_spider(self.spider, start_requests)
      File "/home/stav/.virtualenvs/diffeo/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1253, in unwindGenerator
        return _inlineCallbacks(None, gen, Deferred())
    --- <exception caught here> ---
      File "/home/stav/.virtualenvs/diffeo/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/home/stav/.virtualenvs/diffeo/src/scrapy/scrapy/core/engine.py", line 220, in open_spider
        scheduler = self.scheduler_cls.from_crawler(self.crawler)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/contrib/scrapy/schedulers/frontier.py", line 85, in from_crawler
        return cls(crawler)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/contrib/scrapy/schedulers/frontier.py", line 81, in __init__
        self.frontier = ScrapyFrontierManager(frontier_settings)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/utils/managers.py", line 18, in __init__
        self.manager = FrontierManager.from_settings(settings)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/core/manager.py", line 137, in from_settings
        settings=manager_settings)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/core/manager.py", line 82, in __init__
        self._backend = self._load_object(backend)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/core/manager.py", line 399, in _load_object
        return self._load_frontier_object(obj_class)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/core/manager.py", line 406, in _load_frontier_object
        return obj_class.from_manager(self)
      File "/home/stav/.virtualenvs/diffeo/src/crawl-frontier/crawlfrontier/contrib/backends/memory/__init__.py", line 22, in from_manager
        return cls(manager)
      File "/home/stav/.virtualenvs/diffeo/src/hcf-backend/hcf_backend/backend.py", line 170, in __init__
        params = ParameterManager(manager)
      File "/home/stav/.virtualenvs/diffeo/src/hcf-backend/hcf_backend/utils.py", line 47, in __init__
        self.scrapy_settings = get_scrapy_settings(manager.extra)
    exceptions.AttributeError: 'FrontierManager' object has no attribute 'extra'
kalessin commented 9 years ago

right. the problem is that you are working with the master branch of crawlfrontier, while the changes needed for hcf backend are right now in hcf backend branch. I will fix requirement, although this is temporal.

kalessin commented 9 years ago

fixed

https://github.com/scrapinghub/hcf-backend/commit/156b74a12dbd47ef664d1f80ea783f7c6d87799e

you should remove/reinstall crawlfrontier