peviitor-ro / Scrapers_Cristi_Olteanu_2

1 stars 1 forks source link

Description

This project automates the scraping of job postings from various websites and submits them to the Peviitor.ro platform. It leverages Object-Oriented Programming (OOP) principles for code organization and maintainability.

Advantages of Using OOP

Methods in scrapers.py

Inheritance for Creating New Scrapers

  1. Create a new Python file in the sites directory following the naming convention [website_name]_scraper.py.
  2. Import the Scraper class from src.scrapers.
  3. Create a subclass of Scraper named after the website (e.g., RwsScraper for RWS).
  4. Implement a get_jobs(self) method that extracts job postings using website-specific logic.
  5. Call the base class constructor (super().__init__(...)) with company name, URL, and logo URL.
  6. Call get_jobs(self) to extract jobs and push_peviitor(self) to submit them to Peviitor.ro.

This how data looks on peviitor platform

image

Automated Daily Execution

Scheduled to run daily at 11:05 AM using GitHub Actions workflows (file: scrapers_runner.yml). This ensures job postings are updated regularly on Peviitor.ro.

Installation

  1. Clone or download the project repository.
  2. Install required dependencies using pip install -r requirements.txt.
  3. Configure your Peviitor.ro API access by updating the Authorization header in update_peviitor.py with your access token.

Contributing

We welcome contributions to this project! Here's how you can get involved:

  1. Fork the repository on GitHub.
  2. Create a new branch for your feature or bug fix.
  3. Make changes and write unit tests (if applicable).
  4. Submit a pull request for review and merging.

Potential Problems

Possible Improvements