Automatic Scraper API

Pe Viitor logo

This API will take care of scrapers in case any of them fail.

The Automatic Scraper API is a web service that allows you to automate the process of web scraping and file updating. With this API, you can run scrapers that collect data from various websites, and have those scrapers automatically update your files when changes are detected.

Only supports Python or JavaScript files

API Endpoints

The Automatic Scraper API provides the following endpoints:

Add Scraper `POST scraper/add/`

on Site

or Bash script

curl -X POST -d '{"url":"https://github.com/your_repository.git"}' https://dev.laurentiumarian.ro/scraper/add/

Clone a GitHub repository

To ensure that the files can be read, you need to create a folder within the repository called "sites" and ensure that all files are stored in this folder. Also for dependencies you need the file "requirements.txt" for python or "packange.json" for Javascript in the root folder of the project

Running a scraper `GET/POST scraper/your_github_repository/`

on Site

or Bash script

GET All files

curl -X GET https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/

Run Scraper

curl -X POST -d '{"file":"your_file.extension"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/

Run Forced Scraper

curl -X POST -d '{"file":"your_file.extension","force":"true"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/

Running Tests

Each scraper includes a dedicated section for logging activities

This section allows you to review the results of the scraper's automated test. The test is separate from the scraper itself and can be implemented in any programming language.

Test endpoint

Test Pass or Fail

curl -X POST -d '{"is_succes":"Fail or Pass", "logs":"your message"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/scraper.extension

This endpoint serves the purpose of sending test results to the scraper. A request is sent to the endpoint with two parameters: "is_success" and "logs." The "is_success" parameter is used to indicate the success or failure of the test. The "logs" parameter allows sending a custom message to the scraper. The scraper will display this message in the "Logs" section, providing visibility into the test execution process.

You can enhance the functionality by including a manual test feature with an "Add Test" button. When a test is added, the scraper's behavior can be controlled based on the test status. If the status is set to "Pass," the scraper will run automatically. On the other hand, if the status is marked as "Fail," the scraper will not run and will be deactivated.

Running a Scraper from the "sites" folder

Update Files`POST scraper/your_github_repository/`

The function called "update" verifies whether there are any modifications in the primary branch and updates them if any changes are found.

on Site

or Bash script

curl -X POST -d '{"update":"true"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/

Remove Repository `GET/POST scraper/remove/`

The function named "remove" is responsible for completely deleting the repository from the server.

on Site

or Bash script

curl -X POST -d '{"repo":"your_repository_folder"}' https://dev.laurentiumarian.ro/scraper/remove/

Contributions

If you want to contribute to the development of the scraper, there are several ways you can do so. Firstly, you can help develop the source code by adding new functionalities or fixing existing issues. Secondly, you can contribute to improving the documentation or translations into other languages. Additionally, if you want to help but are unsure where to start, you can check our list of open issues and ask us how you can help. For more information, please refer to the "Contribute" section in our documentation.

Authors

Our team is composed of a group of specialists and education enthusiasts who aim to make a significant contribution in this field.

peviitor team

We are dedicated to the continuous improvement and development of this project, so that we can provide the best resources for everyone interested.

peviitor-ro / scraper_Api

readme

Automatic Scraper API

API Endpoints

Add Scraper `POST scraper/add/`

Running a scraper `GET/POST scraper/your_github_repository/`

GET All files

Run Scraper

Run Forced Scraper

Running Tests

Test endpoint

Test Pass or Fail

Update Files`POST scraper/your_github_repository/`

Remove Repository `GET/POST scraper/remove/`

Contributions

Authors

peviitor-ro / scraper_Api

readme

Automatic Scraper API

API Endpoints

Add Scraper POST scraper/add/

Running a scraper GET/POST scraper/your_github_repository/

GET All files

Run Scraper

Run Forced Scraper

Running Tests

Test endpoint

Test Pass or Fail

Update FilesPOST scraper/your_github_repository/

Remove Repository GET/POST scraper/remove/

Contributions

Authors

Add Scraper `POST scraper/add/`

Running a scraper `GET/POST scraper/your_github_repository/`

Update Files`POST scraper/your_github_repository/`

Remove Repository `GET/POST scraper/remove/`