This API will take care of scrapers in case any of them fail.
The Automatic Scraper API is a web service that allows you to automate the process of web scraping and file updating. With this API, you can run scrapers that collect data from various websites, and have those scrapers automatically update your files when changes are detected.
Only supports Python or JavaScript files
The Automatic Scraper API provides the following endpoints:
POST scraper/add/
on Site
or Bash script
curl -X POST -d '{"url":"https://github.com/your_repository.git"}' https://dev.laurentiumarian.ro/scraper/add/
Clone a GitHub repository
To ensure that the files can be read, you need to create a folder within the repository called "sites" and ensure that all files are stored in this folder. Also for dependencies you need the file "requirements.txt" for python or "packange.json" for Javascript in the root folder of the project
GET/POST scraper/your_github_repository/
on Site
or Bash script
curl -X GET https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/
curl -X POST -d '{"file":"your_file.extension"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/
curl -X POST -d '{"file":"your_file.extension","force":"true"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/
Each scraper includes a dedicated section for logging activities
This section allows you to review the results of the scraper's automated test. The test is separate from the scraper itself and can be implemented in any programming language.
curl -X POST -d '{"is_succes":"Fail or Pass", "logs":"your message"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/scraper.extension
This endpoint serves the purpose of sending test results to the scraper. A request is sent to the endpoint with two parameters: "is_success" and "logs." The "is_success" parameter is used to indicate the success or failure of the test. The "logs" parameter allows sending a custom message to the scraper. The scraper will display this message in the "Logs" section, providing visibility into the test execution process.
You can enhance the functionality by including a manual test feature with an "Add Test" button. When a test is added, the scraper's behavior can be controlled based on the test status. If the status is set to "Pass," the scraper will run automatically. On the other hand, if the status is marked as "Fail," the scraper will not run and will be deactivated.
Running a Scraper from the "sites" folder
POST scraper/your_github_repository/
The function called "update" verifies whether there are any modifications in the primary branch and updates them if any changes are found.
on Site
or Bash script
curl -X POST -d '{"update":"true"}' https://dev.laurentiumarian.ro/scraper/your_repository_root_folder/
GET/POST scraper/remove/
The function named "remove" is responsible for completely deleting the repository from the server.
on Site
or Bash script
curl -X POST -d '{"repo":"your_repository_folder"}' https://dev.laurentiumarian.ro/scraper/remove/
If you want to contribute to the development of the scraper, there are several ways you can do so. Firstly, you can help develop the source code by adding new functionalities or fixing existing issues. Secondly, you can contribute to improving the documentation or translations into other languages. Additionally, if you want to help but are unsure where to start, you can check our list of open issues and ask us how you can help. For more information, please refer to the "Contribute" section in our documentation.
Our team is composed of a group of specialists and education enthusiasts who aim to make a significant contribution in this field.
We are dedicated to the continuous improvement and development of this project, so that we can provide the best resources for everyone interested.