minimaxir / facebook-page-post-scraper

Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis
2.12k stars 661 forks source link

Rearchitect Script for 14.4x Speedup in Reactions Scraping #10

Closed minimaxir closed 7 years ago

minimaxir commented 8 years ago

Scraping reactions is relatively slow for large pages (15 minutes for CNN's FB page) and will get worse as time goes by.

For example, when scraping 100 Statuses:

Current Architecture

The query occurs during processing of the post so no extra data manipulation is necessary.

Better Architecture

The Reaction output from each of the 6 vectors must be mapped to the corresponding post.

101/7 = 14.4x speedup in HTTP, which is the bottleneck.

The challenge is implementing the mapping in a way that is easy to read. Tracking progress with this issue.

baditaflorin commented 7 years ago

Can we help ?

minimaxir commented 7 years ago

Done in https://github.com/minimaxir/facebook-page-post-scraper/commit/ba26e7797f93a55e8d35f7b7408e1c69ccde16ab

Actual speedup is like maybe 4-5x. Have not done benchmarks.