Python Web Scraper is a simple web scraping tool built with Python. It allows you to scrape data from web pages, extract information from HTML elements, save data in text file, download all images, and store table data in a CSV file. The tool provides a user-friendly interface using the Tkinter library.
python --version
Requests allows you to send HTTP/1.1 requests extremely easily.
pip install requests
Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
pip install beautifulsoup4
https://github.com/rohan-bhautoo/Python-Web-Scraper.git
To run the Python Web Scraper, execute the following command:
python main.py
The application will open a GUI window where you can enter the URL of the web page you want to scrape. You can select various options such as extracting links, headings, images, paragraphs, meta data, CSS files, and scripts. You can also choose to download images and store the data in a CSV file.
import requests
from bs4 import BeautifulSoup
# Make request to website
response = requests.get(url)
html_content = response.content
# Parse HTML with BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
# Find elements and extract data
# ...
# Store data in text file
# ...
import requests
url = image.get('src')
# send a GET request to the URL to download the image
response = requests.get(url)
# construct the file name to save the image as
filename = os.path.join(directory, 'image{}'.format(count))
# use os.path.splitext to split the filename into base name and extension
_, extension = os.path.splitext(url)
print(filename)
# save the image to the chosen file path
with open(f'{filename}{extension}', 'wb') as f:
f.write(response.content)
count += 1
from bs4 import BeautifulSoup
# get URL from entry field
url = self.url_entry.get()
# make request to website
response = requests.get(url)
html_content = response.content
# parse HTML with BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
# find table element
table = soup.find('table')
# create table header
table_header = []
for th in table.find_all('th'):
table_header.append(th.text.strip())
# create table rows
table_rows = []
for tr in table.find_all('tr'):
table_row = []
for td in tr.find_all('td'):
table_row.append(td.text.strip())
table_rows.append(table_row)
now = datetime.utcnow()
format = now.strftime("%Y%m%d%H%M")
with open(f"csv/csv_{format}.csv", "w") as f:
csvwriter = csv.writer(f, delimiter=",")
if includeHeader == 1:
print("save header:", table_header)
csvwriter.writerow(table_header)
for row_id in self.treeview.get_children():
row = self.treeview.item(row_id)["values"]
if row != "":
print("save row:", row)
csvwriter.writerow(row)
👤 Rohan Bhautoo
Give a ⭐️ if this project helped you!