Closed gaukhar98 closed 5 years ago
scrapy crawl comments -a email="some@gmail.com" -a password="password" -a page="DonaldTrump/story.php?story_fbid=2123087281087706&id=779444322118682" -a lang="en" -o Trump.csv It is not woking! And gives ERROR as below: 2019-02-12 07:21:42 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: fbcrawl) 2019-02-12 07:21:42 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.0.2o 27 Mar 2018), cryptography 2.2.2, Platform Windows-10-10.0.17134-SP0 2019-02-12 07:21:42 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'fbcrawl', 'FEED_EXPORT_ENCODING': 'utf-8', 'FEED_FORMAT': 'csv', 'FEED_URI': 'chinese.csv', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'fbcrawl.spiders', 'SPIDER_MODULES': ['fbcrawl.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'} 2019-02-12 07:21:42 [scrapy.extensions.telnet] INFO: Telnet Password: 3895d32ba798cd1e 2019-02-12 07:21:42 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2019-02-12 07:21:43 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-02-12 07:21:43 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-02-12 07:21:43 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-02-12 07:21:43 [scrapy.core.engine] INFO: Spider opened 2019-02-12 07:21:43 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-02-12 07:21:43 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-02-12 07:21:47 [comments] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump/story.php?story_fbid=2123087281087706&id=779444322118682 2019-02-12 07:21:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://mbasic.facebook.com/DonaldTrump/story.php?story_fbid=2123087281087706&id=779444322118682>: HTTP status code is not handled or not allowed 2019-02-12 07:21:48 [scrapy.core.engine] INFO: Closing spider (finished) 2019-02-12 07:21:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 4038, 'downloader/request_count': 6, 'downloader/request_method_count/GET': 4, 'downloader/request_method_count/POST': 2, 'downloader/response_bytes': 40779, 'downloader/response_count': 6, 'downloader/response_status_count/200': 3, 'downloader/response_status_count/302': 2, 'downloader/response_status_count/404': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 2, 12, 1, 21, 48, 207776), 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/404': 1, 'log_count/INFO': 11, 'request_depth_max': 3, 'response_received_count': 4, 'scheduler/dequeued': 6, 'scheduler/dequeued/memory': 6, 'scheduler/enqueued': 6, 'scheduler/enqueued/memory': 6, 'start_time': datetime.datetime(2019, 2, 12, 1, 21, 43, 376171)} 2019-02-12 07:21:48 [scrapy.core.engine] INFO: Spider closed (finished)
Yes, it's specified in the README, the comments crawler is broken, it will be refactored before Feb 24th, with some new features
Refactoring is done! Let me know if you encounter further problems!
scrapy crawl comments -a email="some@gmail.com" -a password="password" -a page="DonaldTrump/story.php?story_fbid=2123087281087706&id=779444322118682" -a lang="en" -o Trump.csv It is not woking! And gives ERROR as below: 2019-02-12 07:21:42 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: fbcrawl) 2019-02-12 07:21:42 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.0.2o 27 Mar 2018), cryptography 2.2.2, Platform Windows-10-10.0.17134-SP0 2019-02-12 07:21:42 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'fbcrawl', 'FEED_EXPORT_ENCODING': 'utf-8', 'FEED_FORMAT': 'csv', 'FEED_URI': 'chinese.csv', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'fbcrawl.spiders', 'SPIDER_MODULES': ['fbcrawl.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'} 2019-02-12 07:21:42 [scrapy.extensions.telnet] INFO: Telnet Password: 3895d32ba798cd1e 2019-02-12 07:21:42 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2019-02-12 07:21:43 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-02-12 07:21:43 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-02-12 07:21:43 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-02-12 07:21:43 [scrapy.core.engine] INFO: Spider opened 2019-02-12 07:21:43 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-02-12 07:21:43 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-02-12 07:21:47 [comments] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump/story.php?story_fbid=2123087281087706&id=779444322118682 2019-02-12 07:21:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://mbasic.facebook.com/DonaldTrump/story.php?story_fbid=2123087281087706&id=779444322118682>: HTTP status code is not handled or not allowed 2019-02-12 07:21:48 [scrapy.core.engine] INFO: Closing spider (finished) 2019-02-12 07:21:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 4038, 'downloader/request_count': 6, 'downloader/request_method_count/GET': 4, 'downloader/request_method_count/POST': 2, 'downloader/response_bytes': 40779, 'downloader/response_count': 6, 'downloader/response_status_count/200': 3, 'downloader/response_status_count/302': 2, 'downloader/response_status_count/404': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 2, 12, 1, 21, 48, 207776), 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/404': 1, 'log_count/INFO': 11, 'request_depth_max': 3, 'response_received_count': 4, 'scheduler/dequeued': 6, 'scheduler/dequeued/memory': 6, 'scheduler/enqueued': 6, 'scheduler/enqueued/memory': 6, 'start_time': datetime.datetime(2019, 2, 12, 1, 21, 43, 376171)} 2019-02-12 07:21:48 [scrapy.core.engine] INFO: Spider closed (finished)