scrapinghub / scrapyrt

HTTP API for Scrapy spiders
BSD 3-Clause "New" or "Revised" License
832 stars 162 forks source link

Couldn't disable log or change log level in stdout #83

Open jh88 opened 5 years ago

jh88 commented 5 years ago

In my custom CrawlManager, I changed LOG_LEVEL and LOG_ENABLED.

class CrawlManager(ScrapyrtCrawlManager):

       ...

        def get_scrapyrt_settings(self):
            spider_settings = {
                "LOG_LEVEL": "INFO",
                "LOG_ENABLED": False,
                "LOG_FILE": None,
                "LOG_STDOUT": False,
                "EXTENSIONS": {
                    'scrapy.extensions.logstats.LogStats': None,
                    'scrapy.webservice.WebService': None,
                    'scrapy.extensions.telnet.TelnetConsole': None,
                    'scrapy.extensions.throttle.AutoThrottle': None
                }
            }
            return spider_settings

        def get_project_settings(self):
            custom_settings = self.get_scrapyrt_settings()
            return get_project_settings(custom_settings=custom_settings)

      ...

However neither of them worked, I could still see logs and DEBUG

2018-12-20 09:56:44+1100 [scrapyrt] {'request': {'url': 'https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/', 'meta': {'test': False, 'show_rule': True}}, 'spider_name': 'news_scraper'}
WARNING:py.warnings:/Users/ubuntu/miniconda3/envs/streem/lib/python3.6/site-packages/scrapyrt/core.py:9: ScrapyDeprecationWarning: Module `scrapy.log` has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more.
  from scrapy import signals, log as scrapy_log

2018-12-20 09:56:44+1100 [scrapyrt] Created request for spider news_scraper with url https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/ and kwargs {'meta': {'test': False, 'show_rule': True}}
INFO:scrapy.crawler:Overridden settings: {'BOT_NAME': 'news_scraper', 'LOG_ENABLED': False, 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'news_scraper.spiders', 'SPIDER_MODULES': ['news_scraper.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'}
INFO:scrapy.middleware:Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.memusage.MemoryUsage']
INFO:scrapy.middleware:Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
INFO:scrapy.middleware:Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
INFO:scrapy.middleware:Enabled item pipelines:
[]
INFO:scrapy.core.engine:Spider opened
DEBUG:scrapy.core.engine:Crawled (200) <GET https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/> (referer: http://media.streem.com.au)
DEBUG:scrapy.core.scraper:Scraped from <200 https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/>
{'author': 'Industry Opinion',
 'body': 'OPINION: The rail project may well help get more commuters into the '
         '...',
 'detected_lang': {'code': 'en', 'confidence': 99.0},
 'language': 'en',
 'modified_at': None,
 'published_at': '2017-07-10T02:42:24+10:00',
 'title': 'Brisbane’s Cross River Rail will feed the centre at the expense of '
          'people in the suburbs',
 'url': 'https://www.railexpress.com.au/brisbanes-cross-river-rail-will-feed-the-centre-at-the-expense-of-people-in-the-suburbs/'}
INFO:scrapy.core.engine:Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{'downloader/request_bytes': 426,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 19470,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 19, 22, 56, 44, 571116),
 'item_scraped_count': 1,
 'log_count/INFO': 6,
 'log_count/WARNING': 1,
 'memusage/max': 57487360,
 'memusage/startup': 57487360,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2018, 12, 19, 22, 56, 44, 277020)}
INFO:scrapy.core.engine:Spider closed (finished)
2018-12-20 09:56:44+1100 [-] "127.0.0.1" - - [19/Dec/2018:22:56:43 +0000] "POST /scrape HTTP/1.1" 200 7665 "-" "PostmanRuntime/7.4.0"
2018-12-20 09:57:44+1100 [-] Timing out client: IPv4Address(TCP, '127.0.0.1', 53118)

My gold is to remove all default logs from stdout, so I can use stdout to only show my own logs.

pawelmhm commented 5 years ago

related to https://github.com/scrapinghub/scrapyrt/issues/10