Description
This project automates the scraping of job postings from various websites and submits them to the Peviitor.ro platform. It leverages Object-Oriented Programming (OOP) principles for code organization and maintainability.
Advantages of Using OOP
Scraper
, UpdatePeViitor
, etc.), making it easier to understand, maintain, and extend.Scraper
class serves as a blueprint for creating new scrapers for different websites, promoting code reuse.Methods in scrapers.py
get_soup(self, params=None)
: Fetches and parses HTML content using BeautifulSoup, handling optional request parameters.get_link_soup(self, link)
: Fetches and parses HTML content from a specified link.get_json_link(self, link)
: Fetches and returns JSON data from a link.get_json(self, json=None, data=None, params=None)
: Makes a GET request with optional JSON data, request data, or URL parameters and returns the JSON response.post_json(self, headers=None, json=None, data=None, params=None)
: Makes a POST request with optional headers, JSON data, request data, or URL parameters and returns the JSON response.post_html(self, headers=None, data=None, params=None)
: Makes a POST request that returns parsed HTML content.get_cookies(self, *args)
: Retrieves cookies from the website's response headers.@staticmethod get_validated_city(city)
: Validates a city name against a predefined list of acceptable city names.get_jobs_dict(self, job_title, job_link, city, remote='On-site')
: Creates a dictionary containing job details and appends it to the jobs_list
attribute.push_peviitor(self)
: Pushes scraped jobs and company logo (if available) to Peviitor.ro using the UpdatePeViitor
class.Inheritance for Creating New Scrapers
sites
directory following the naming convention [website_name]_scraper.py
.Scraper
class from src.scrapers
.Scraper
named after the website (e.g., RwsScraper
for RWS).get_jobs(self)
method that extracts job postings using website-specific logic.super().__init__(...)
) with company name, URL, and logo URL.get_jobs(self)
to extract jobs and push_peviitor(self)
to submit them to Peviitor.ro.This how data looks on peviitor platform
Automated Daily Execution
Scheduled to run daily at 11:05 AM using GitHub Actions workflows (file: scrapers_runner.yml). This ensures job postings are updated regularly on Peviitor.ro.
Installation
pip install -r requirements.txt
.Authorization
header in update_peviitor.py
with your access token.Contributing
We welcome contributions to this project! Here's how you can get involved:
Potential Problems
Possible Improvements