Closed ummezafiirah closed 6 years ago
Hi, welcome to the scrapy world, it's a fun journey! First thing I notice is that right before the last ERROR line you should have the INFO like:
[fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump
So make sure you gave the appropriate page name. Also sometimes the bot gets stuck because Facebook fingerprints the browser and tries to block the new scrapy device. Although I've written some code in the parse_home
function to bypass this behavior, sometimes it doesn't work well. A simple workaround is to log in via your traditional web browser once and everything should work fine.
Also check your mailbox, sometimes fb sends you an email saying that an unknown device has been trying to access without permission.
Hello,
I am new to scrapy and I have tried your codes. I tried to scrap Donald Trump page I have this being displayed: [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) I can't figure out where actually the problem is.
Please find below the entire message being output: 2018-09-10 23:14:01 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: fbcrawl) 2018-09-10 23:14:01 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'fbcrawl', 'FEED_EXPORT_ENCODING': 'utf-8', 'FEED_EXPORT_FIELDS': ['source', 'date', 'text', 'reactions', 'likes', 'ahah', 'love', 'wow', 'sigh', 'grrr', 'comments', 'url'], 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'fbcrawl.spiders', 'SPIDER_MODULES': ['fbcrawl.spiders']} 2018-09-10 23:14:01 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2018-09-10 23:14:02 [scrapy.middleware] INFO: Enabled item pipelines: ['fbcrawl.pipelines.FbcrawlPipeline'] 2018-09-10 23:14:02 [scrapy.core.engine] INFO: Spider opened 2018-09-10 23:14:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-09-10 23:14:05 [fb] INFO: Parse function called on https://mbasic.facebook.com/DonaldTrump/?refid=46 2018-09-10 23:14:06 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump/?refid=46> (referer: https://mbasic.facebook.com/login/save-device/?login_source=login&refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&refid=8&_rdr)