rugantio / fbcrawl

A Facebook crawler
Apache License 2.0
668 stars 229 forks source link

after finishing successfully, it results an empty csv file fbcrawl.py #63

Closed b-girma closed 4 years ago

b-girma commented 4 years ago

2020-06-22 12:43:06 [scrapy.extensions.logstats] INFO: Crawled 78 pages (at 17 pages/min), scraped 0 items (at 0 items/min) 2020-06-22 12:43:08 [fb] INFO: [!] "more" link not found, will look for a "year" link 2020-06-22 12:43:08 [fb] INFO: Crawling has finished with no errors! 2020-06-22 12:43:08 [scrapy.core.engine] INFO: Closing spider (finished) 2020-06-22 12:43:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 101198, 'downloader/request_count': 81, 'downloader/request_method_count/GET': 79, 'downloader/request_method_count/POST': 2, 'downloader/response_bytes': 1797548, 'downloader/response_count': 81, 'downloader/response_status_count/200': 79, 'downloader/response_status_count/302': 2, 'elapsed_time_seconds': 301.706243, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2020, 6, 22, 14, 43, 8, 227749), 'log_count/INFO': 95, 'request_depth_max': 78, 'response_received_count': 79, 'scheduler/dequeued': 81, 'scheduler/dequeued/memory': 81, 'scheduler/enqueued': 81, 'scheduler/enqueued/memory': 81, 'start_time': datetime.datetime(2020, 6, 22, 14, 38, 6, 521506)} 2020-06-22 12:43:08 [scrapy.core.engine] INFO: Spider closed (finished)

b-girma commented 4 years ago

After doing the following steps everything worked well

  1. downgrading scrapy framework to version 1.5.0
  2. changing the XPath of the posts on line 150 image

    to

image

  1. changing the XPath of the comments on line 172 image

to

image