scrapfly / scrapfly-scrapers

Web scrapers for popular targets powered Scrapfly.io
https://scrapfly.io
Other
169 stars 46 forks source link
crawling python webscraping

ScrapFly Scrapers 🕷️

This repository contains educational example scrapers for popular web scraping targets using the ScrapFly web scraping API and Python.
Most Scrapers use a simple web scraping stack:

To learn more about web scraping see our full tutorials on how to scrape these targets (and many others) see the scrapeguide directory.

Fair Use and Legal Disclaimer

This repository contains educational reference material to illustrate how accessible web scraping can be and the provided programs are not intented to be used in web scraping production. That being said, Scrapfly team is constantly updating and improving all of this code for optimal experience.

Scrapfly does not offer legal advice and as always, consult a lawyer when creating programs that interact with other people's websites though here's a good general intro of what NOT to do:

Setup and Run

  1. Install the required libraries:
    $ pip install scrapfly-sdk[all] jmespath loguru nested-lookup  
  2. Export your ScrapFly API key
    • On Mac:
      $ export SCRAPFLY_KEY="YOUR SCRAPFLY KEY"
    • On Windows:
      $ setx SCRAPFLY_KEY "YOUR SCRAPFLY KEY"
  3. cd into the scraper directory and run the code:
    $ cd ./example-scraper
    $ python run.py

List of Scrapers

The following is the list of supported websites grouped by type.

E-Commerce

Fashion

Jobs and Companies

Real Estate

Reviews

Search Engines

Social Media

Travel


Aliexpress

The aliexpress.com scraper can scrape the following data:

View sample data - [Product pages](./aliexpress-scraper/results/product.json) - [Search pages](./aliexpress-scraper/results/search.json) - [Product reviews](./aliexpress-scraper/results/reviews.json)

For the full guide, refer to our blog article How to Scrape Aliexpress.com (2023 Update)

Amazon

The amazon.com scraper can scrape the following data:

View sample data - [Product pages](./amazon-scraper/results/product.json) - [Search pages](./amazon-scraper/results/search.json) - [Product reviews](./amazon-scraper/results/reviews.json)

For the full guide, refer to our blog article How to Scrape Amazon.com Product Data and Reviews

BestBuy

The bestbuy.com scraper can scrape the following data:

View sample data - [Sitemap pages](./bestbuy-scraper/results/promos.json) - [Product pages](./bestbuy-scraper/results/products.json) - [Review pages](./bestbuy-scraper/results/reviews.json) - [Search pages](./bestbuy-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape BestBuy Product, Offer and Review Data

Bing

The bing.com scraper can scrape the following data:

View sample data - [SERP data](./bing-scraper/results/serps.json) - [Keyword data](./bing-scraper/results/keywords.json) - [Rich snippet data](./bing-scraper/results/rich_snippets.json)

For the full guide, refer to our blog article How to Scrape Bing Search with Python

Booking

The booking.com scraper can scrape the following data:

View sample data - [Hotel pages](./bookingcom-scraper/results/hotel.json) - [Search pages](./bookingcom-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Booking.com (2023 Update)

Crunchbase

The crunchbase.com scraper can scrape the following data:

View sample data - [Company pages](./crunchbase-scraper/results/company.json) - [Investor pages](./crunchbase-scraper/results/person.json)

For the full guide, refer to our blog article How to Scrape Crunchbase Company and People Data (2023 Update)

Domain

The domain.com.au scraper can scrape the following data:

View sample data - [Property pages](./domaincom-scraper/results/properties.json) - [Search pages](./domaincom-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Domain.com.au Real Estate Property Data

Ebay

The ebay.com scraper can scrape the following data:

View sample data - [Product pages](./ebay-scraper/results/product.json) - [Product pages with variant](./ebay-scraper/results/product-with-variants.json) - [Search pages](./ebay-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Ebay using Python

Etsy

The etsy.com scraper can scrape the following data:

View sample data - [Product pages](./etsy-scraper/results/products.json) - [Shop pages](./etsy-scraper/results/shops.json) - [Search pages](./etsy-scraper/results/search.json)

For the full guide, refer to our blog article

Fashionphile

The fashionphile.com scraper can scrape the following data:

View sample data - [Product pages](./fashionphile-scraper/results/products.json) - [Search pages](./fashionphile-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Fashionphile for Second Hand Fashion Data

Glassdoor

The glassdoor.com scraper can scrape the following data:

View sample data - [Job pages](./glassdoor-scraper/results/jobs.json) - [Review pages](./glassdoor-scraper/results/reviews.json) - [Salary pages](./glassdoor-scraper/results/salaries.json)

For the full guide, refer to our blog article How to Scrape Glassdoor (2023 update)

Goat

The goat.com scraper can scrape the following data:

View sample data - [Product pages](./goat-scraper/results/products.json) - [Search pages](./goat-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Goat.com for Fashion Apparel Data in Python

Homegate

The homegate.ch scraper can scrape the following data:

View sample data - [Property pages](./homegate-scraper/results/properties.json) - [Search pages](./homegate-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Homegate.ch Real Estate Property Data

Idealista

The idealista.com scraper can scrape the following data:

View sample data - [Property pages](./idealista-scraper/results/properties.json) - [Search pages](./idealista-scraper/results/search.json) - [Provinces pages](./idealista-scraper/results/search_URLs.json)

For the full guide, refer to our blog article How to Scrape Idealista.com in Python - Real Estate Property Data

Immobilienscout24

The immobilienscout24.de scraper can scrape the following data:

View sample data - [Property pages](./immobilienscout24-scraper/results/properties.json) - [Search pages](./immobilienscout24-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Immobilienscout24.de Real Estate Data

Immoscout24

The immoscout24.ch scraper can scrape the following data:

View sample data - [Property pages](./immoscout24-scraper/results/properties.json) - [Search pages](./immoscout24-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Immoscout24.ch Real Estate Property Data

Immowelt

The immowelt.de scraper can scrape the following data:

View sample data - [Property pages](./immowelt-scraper/results/properties.json) - [Search pages](./immowelt-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Immowelt.de Real Estate Data

Indeed

The indeed.com scraper can scrape the following data:

View sample data - [Job pages](./indeed-scraper/results/jobs.json) - [Search pages](./indeed-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Indeed.com (2023 Update)

Instagram

The instagram.com scraper can scrape the following data:

View sample data - [User](./instagram-scraper/results/user.json) - [All user posts](./instagram-scraper/results/all-user-posts.json) - [Multi image post](./instagram-scraper/results/multi-image-post.json) - [Video Post](./instagram-scraper/results/video-post.json)

For the full guide, refer to our blog article How to Scrape Instagram

Leboncoin

The leboncoin.fr scraper can scrape the following data:

View sample data - [Ad pages](./leboncoin-scraper/results/ad.json) - [Search pages](./leboncoin-scraper/results/search.json)

For the full guide, refer to our blog article How to Web Scrape Leboncoin.fr using Python

Nordstorm

The nordstorm.com scraper can scrape the following data:

View sample data - [Product pages](./nordstorm-scraper/results/products.json) - [Search pages](./nordstorm-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Nordstrom Fashion Product Data

Realestate

The realestate.com.au scraper can scrape the following data:

View sample data - [Property pages](./realestatecom-scraper/results/properties.json) - [Search pages](./realestatecom-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Realestate.com.au Property Listing Data

Realtor

The realtor.com scraper can scrape the following data:

View sample data - [Property pages](./realtor-scraper/results/properties.json) - [Search pages](./realtor-scraper/results/search.json) - [Feed pages](./realtor-scraper/results/feed.json)

For the full guide, refer to our blog article How to Scrape Realtor.com - Real Estate Property Data

Reddit

The reddit.com scraper can scrape the following data:

View sample data - [Subreddit pages](./reddit-scraper/results/subreddit.json) - [Post pages](./reddit-scraper/results/post.json) - [User comment pages](./reddit-scraper/results/user_comments.json) - [User post pages](./reddit-scraper/results/user_posts.json)

For the full guide, refer to our blog article How to Scrape Reddit Posts, Subreddits and Profiles

Redfin

The redfin.com scraper can scrape the following data:

View sample data - [Property pages for sale](./redfin-scraper/results/properties_for_sale.json) - [Property pages for rent](./redfin-scraper/results/properties_for_rent.json) - [Search pages](./redfin-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Redfin Real Estate Property Data in Python

Rightmove

The rightmove.co.uk scraper can scrape the following data:

View sample data - [Property pages](./rightmove-scraper/results/properties.json) - [Search pages](./rightmove-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape RightMove Real Estate Property Data with Python

Seloger

The seloger.com scraper can scrape the following data:

View sample data - [Property pages](./seloger-scraper/results/property.json) - [Search pages](./seloger-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Seloger.com - Real Estate Listing Data

Similarweb

The similarweb.com scraper can scrape the following data:

View sample data - [Website pages](./similarweb-scraper/results/websites.json) - [Website compare pages](./similarweb-scraper/results/websites_compare.json) - [Trend pages](./similarweb-scraper/results/trends.json) - [Sitemaps](./similarweb-scraper/results/sitemap_urls.json)

For the full guide, refer to our blog article How to Scrape SimilarWeb Website Traffic Analytics

Stockx

The stockx.com scraper can scrape the following data:

View sample data - [Property pages](./stockx-scraper/results/product.json) - [Search pages](./stockx-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape StockX e-commerce Data with Python

Threads

The threads.net scraper can scrape the following data:

View sample data - [Profile pages](./threads-scraper/results/profile.json) - [Thread pages](./threads-scraper/results/thread.json)

For the full guide, refer to our blog article How to scrape Threads by Meta using Python (2023-08 Update)

TikTok

The tiktok.com scraper can scrape the following data:

View sample data - [Comment data](./tiktok-scraper/results/comments.json) - [Post data](./tiktok-scraper/results/posts.json) - [Profile data](./tiktok-scraper/results/profiles.json) - [Channel data](./tiktok-scraper/results/channel.json) - [Search data](./tiktok-scraper/results/search.json)

For the full guide, refer to our blog article How To Scrape TikTok in 2024

Tripadvisor

The tripadvisor.com scraper can scrape the following data:

View sample data - [Hotel pages](./tripadvisor-scraper/results/hotels.json) - [Search pages](./tripadvisor-scraper/results/search.json) - [Location pages](./tripadvisor-scraper/results/location.json)

For the full guide, refer to our blog article How to Scrape TripAdvisor.com (2023 Updated)

Trustpilot

The trustpilot.com scraper can scrape the following data:

View sample data - [Company pages](./trustpilot-scraper/results/companies.json) - [Reviews pages](./trustpilot-scraper/results/reviews.json) - [Search pages](./trustpilot-scraper/results/search.json)

For the full guide, refer to out blog article How to Scrape Trustpilot.com Reviews and Company Data

Twitter(X)

The twitter.com scraper can scrape the following data:

View sample data - [Profile pages](./twitter-scraper/results/profile.json) - [Tweet pages](./twitter-scraper/results/tweet.json)

For the full guide, refer to our blog article How to Scrape X.com (Twitter) using Python (2023-11 Update)

Vestiaire collective

The vestiairecollective.com scraper can scrape the following data:

View sample data - [Product pages](./vestiairecollective-scraper/results/products.json) - [Search pages](./vestiairecollective-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Vestiaire Collective for Fashion Product Data

G2

The g2.com scraper can scrape the following data:

View sample data - [Review pages](./g2-scraper/results/reviews.json) - [Search pages](./g2-scraper/results/search.json) - [Alternatives pages](./g2-scraper/results/alternatives.json)

For the full guide, refer to our blog article How to Scrape G2 Company Data and Reviews

Walmart

The walmart.com scraper can scrape the following data:

View sample data - [Product pages](./walmart-scraper/results/products.json) - [Search pages](./walmart-scraper/results/search.json)

For the full guide, refer to our blog article How to Web Scrape Walmart.com (2023 Update)

Wellfound

The wellfound.com scraper can scrape the following data:

View sample data - [Company pages](./wellfound-scraper/results/companies.json) - [Search pages](./wellfound-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Wellfound Company Data and Job Listings

Linkedin

The linkedin.com scraper can scrape the following data:

View sample data - [Profile pages](./linkedin-scraper/results/profile.json) - [Company pages](./linkedin-scraper/results/company.json) - [Job search pages](./linkedin-scraper/results/job_search.json) - [Job pages](./linkedin-scraper/results/jobs.json)

For the full guide, refer to our blog article How to Scrape LinkedIn.com Profile, Company, and Job Data

Yellowpages

The yellowpages.com scraper can scrape the following data:

View sample data - [Business pages](./yellowpages-scraper/results/business_pages.json) - [Search pages](./yellowpages-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape YellowPages.com Business Data and Reviews (2023 Update)

Yelp

The yelp.com scraper can scrape the following data:

View sample data - [Business pages](./yelp-scraper/results/business_pages.json) - [Review pages](./yelp-scraper/results/reviews.json) - [Search pages](./yelp-scraper/results/search.json)

For the full guide, refer to our blog article How to Web Scrape Yelp.com (2023 update)

Zillow

The zillow.com scraper can scrape the following data:

View sample data - [Property pages](./zillow-scraper/results/property.json) - [Search pages](./zillow-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Zillow Real Estate Property Data in Python

Zoominfo

The zoominfo.com scraper can scrape the following data:

View sample data - [Company pages](./zoominfo-scraper/results/companies.json) - [Directory pages](./zoominfo-scraper/results/directory.json) - [FAQs data](./zoominfo-scraper/results/faqs.json)

For the full guide, refer to our blog article How to Scrape Zoominfo Company Data (2023 Update)

Zoopla

The zoopla.co.uk scraper can scrape the following data:

View sample data - [Property pages](./zoopla-scraper/results/properties.json) - [Search pages](./zoopla-scraper/results/search.json)

For the full guide, refer to our blog article How to Scrape Zoopla Real Estate Property Data in Python