Open jh88 opened 5 years ago
In my custom CrawlManager, I changed LOG_LEVEL and LOG_ENABLED.
CrawlManager
LOG_LEVEL
LOG_ENABLED
class CrawlManager(ScrapyrtCrawlManager): ... def get_scrapyrt_settings(self): spider_settings = { "LOG_LEVEL": "INFO", "LOG_ENABLED": False, "LOG_FILE": None, "LOG_STDOUT": False, "EXTENSIONS": { 'scrapy.extensions.logstats.LogStats': None, 'scrapy.webservice.WebService': None, 'scrapy.extensions.telnet.TelnetConsole': None, 'scrapy.extensions.throttle.AutoThrottle': None } } return spider_settings def get_project_settings(self): custom_settings = self.get_scrapyrt_settings() return get_project_settings(custom_settings=custom_settings) ...
However neither of them worked, I could still see logs and DEBUG
DEBUG
2018-12-20 09:56:44+1100 [scrapyrt] {'request': {'url': 'https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/', 'meta': {'test': False, 'show_rule': True}}, 'spider_name': 'news_scraper'} WARNING:py.warnings:/Users/ubuntu/miniconda3/envs/streem/lib/python3.6/site-packages/scrapyrt/core.py:9: ScrapyDeprecationWarning: Module `scrapy.log` has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more. from scrapy import signals, log as scrapy_log 2018-12-20 09:56:44+1100 [scrapyrt] Created request for spider news_scraper with url https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/ and kwargs {'meta': {'test': False, 'show_rule': True}} INFO:scrapy.crawler:Overridden settings: {'BOT_NAME': 'news_scraper', 'LOG_ENABLED': False, 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'news_scraper.spiders', 'SPIDER_MODULES': ['news_scraper.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'} INFO:scrapy.middleware:Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.memusage.MemoryUsage'] INFO:scrapy.middleware:Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] INFO:scrapy.middleware:Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] INFO:scrapy.middleware:Enabled item pipelines: [] INFO:scrapy.core.engine:Spider opened DEBUG:scrapy.core.engine:Crawled (200) <GET https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/> (referer: http://media.streem.com.au) DEBUG:scrapy.core.scraper:Scraped from <200 https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/> {'author': 'Industry Opinion', 'body': 'OPINION: The rail project may well help get more commuters into the ' '...', 'detected_lang': {'code': 'en', 'confidence': 99.0}, 'language': 'en', 'modified_at': None, 'published_at': '2017-07-10T02:42:24+10:00', 'title': 'Brisbane’s Cross River Rail will feed the centre at the expense of ' 'people in the suburbs', 'url': 'https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/'} INFO:scrapy.core.engine:Closing spider (finished) INFO:scrapy.statscollectors:Dumping Scrapy stats: {'downloader/request_bytes': 426, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 19470, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2018, 12, 19, 22, 56, 44, 571116), 'item_scraped_count': 1, 'log_count/INFO': 6, 'log_count/WARNING': 1, 'memusage/max': 57487360, 'memusage/startup': 57487360, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2018, 12, 19, 22, 56, 44, 277020)} INFO:scrapy.core.engine:Spider closed (finished) 2018-12-20 09:56:44+1100 [-] "127.0.0.1" - - [19/Dec/2018:22:56:43 +0000] "POST /scrape HTTP/1.1" 200 7665 "-" "PostmanRuntime/7.4.0" 2018-12-20 09:57:44+1100 [-] Timing out client: IPv4Address(TCP, '127.0.0.1', 53118)
My gold is to remove all default logs from stdout, so I can use stdout to only show my own logs.
related to https://github.com/scrapinghub/scrapyrt/issues/10
In my custom
CrawlManager
, I changedLOG_LEVEL
andLOG_ENABLED
.However neither of them worked, I could still see logs and
DEBUG
My gold is to remove all default logs from stdout, so I can use stdout to only show my own logs.