Closed deverten closed 2 years ago
Good afternoon and thank you for submitting your topic suggestion. Your topic form has been entered into our queue and should be reviewed (for approval) as soon as a content moderator is finished reviewing the ones in the queue before it. 🚀
Sounds like a helpful topic. @deverten We actively sell products that aim to prevent web scraping. Some websites do not allow web scraping. Therefore, ensuring permissions should be front and center with this article, the topic is okay as long as you lead with the legal issues. Approved.
Let's please be sure it adds value beyond what is in any official docs and/or what is covered in other blog sites. (the articles should go beyond a basic explanation - and it is always best to reference any EngEd article and build upon it).
Please be attentive to grammar/readability and make sure that you put your article through a thorough editing review prior to submitting it for final approval. (There are some great free tools that we reference in EngEd resources.) ANY ARTICLE SUBMITTED WITH GLARING ERRORS WILL BE IMMEDIATELY CLOSED.
Build a web scraper using ExpressJs, NodeJs, and Cheerio
Introduction
Sometimes we find the need to extract data from a website, but it becomes a challenge when there is no API available for developers. Most websites allow for the extraction of data through a process called "Web Scraping" as long as legal policies are adhered to. Web scraping helps in the automation of tasks such as manually listing all the products in a company's website, copying out the country code of all the countries in a website's drop-down list, and almost any form of information a website may contain. Data scientists find Web scraping to be a very useful tool as they can scrape data and organize them in tables for proper analysis, developers can build APIs from the data. In this tutorial we will build a simple web scraper using backend technologies (NodeJs, ExpressJs) for running the web scraper code on the server, Cheerio - a tool for parsing the websites HTML in NodeJs, and Axios for making HTTP requests. We will be extracting product names, prices, and ratings from Amazon's website. The web scraper can be customized based on the developer's needs.
Key takeaways
At the end of the article the reader should:
Article quality
This article aims to introduce web scraping to beginners and advanced developers. It is detailed and quickly gets the reader up to speed with building web scrapers and applying them to popular websites.
References