rachitavya / crops_webpages_scrapper

0 stars 0 forks source link

readme

Illustrated Technical Books PDF scrapper and downloader

Download the dependencies: pip install beautifulsoup4 requests
Run pdf_books_scrapper.py file and the downloading will start python3 pdf_books_scrapper
Output PDFs will be stored in output/books_pdf/ folder.

crops_webpages_pdf_scrapper

This is an under development scrapper repository where I am using python to scrap information from this link.

Steps to run the scrapper:

For PDF Output

Open terminal and run this command to install all the required dependencies: pip install -r requirements.txt
Run main.py python3 main.py
Check the output pdf files for every crop in output/rabi_crops_pdf folder

For JSON Output

Put your OpenAI API key in a .env file
Run rabi_crops_scrapper.py python3 python rabi_crops_scrapper.py
Check the output pdf files for every crop in output/rabi_crops_json folder