section-engineering-education / engineering-education

“Section's Engineering Education (EngEd) Program is dedicated to offering a unique quality community experience for computer science university students."
Apache License 2.0
363 stars 889 forks source link

Build a web scraper using ExpressJs, NodeJs and Cheerio #5887

Closed deverten closed 2 years ago

deverten commented 2 years ago

Build a web scraper using ExpressJs, NodeJs, and Cheerio

Introduction

Sometimes we find the need to extract data from a website, but it becomes a challenge when there is no API available for developers. Most websites allow for the extraction of data through a process called "Web Scraping" as long as legal policies are adhered to. Web scraping helps in the automation of tasks such as manually listing all the products in a company's website, copying out the country code of all the countries in a website's drop-down list, and almost any form of information a website may contain. Data scientists find Web scraping to be a very useful tool as they can scrape data and organize them in tables for proper analysis, developers can build APIs from the data. In this tutorial we will build a simple web scraper using backend technologies (NodeJs, ExpressJs) for running the web scraper code on the server, Cheerio - a tool for parsing the websites HTML in NodeJs, and Axios for making HTTP requests. We will be extracting product names, prices, and ratings from Amazon's website. The web scraper can be customized based on the developer's needs.

Key takeaways

At the end of the article the reader should:

Article quality

This article aims to introduce web scraping to beginners and advanced developers. It is detailed and quickly gets the reader up to speed with building web scrapers and applying them to popular websites.

References

hectorkambow commented 2 years ago

Good afternoon and thank you for submitting your topic suggestion. Your topic form has been entered into our queue and should be reviewed (for approval) as soon as a content moderator is finished reviewing the ones in the queue before it. 🚀

ahmadmardeni1 commented 2 years ago

Sounds like a helpful topic. @deverten We actively sell products that aim to prevent web scraping. Some websites do not allow web scraping. Therefore, ensuring permissions should be front and center with this article, the topic is okay as long as you lead with the legal issues. Approved.

Let's please be sure it adds value beyond what is in any official docs and/or what is covered in other blog sites. (the articles should go beyond a basic explanation - and it is always best to reference any EngEd article and build upon it).

Please be attentive to grammar/readability and make sure that you put your article through a thorough editing review prior to submitting it for final approval. (There are some great free tools that we reference in EngEd resources.) ANY ARTICLE SUBMITTED WITH GLARING ERRORS WILL BE IMMEDIATELY CLOSED.