Open kaljuvee opened 1 year ago
20-24h for development (we can scrape all the alphabet because we already have the algorithm left from findartinfo.com). But the problem is that the bot will scraper the data for a week I guess. Only "a*" has result of 395k arts
Approved
https://github.com/kanvas-ai/artindex/tree/main/scrapers/bidtoart
After you pull the code from the repo do these actions once:
python -m venv venv
- to create a virtual environmentsource venv/bin/activate
pip install -r requirements.txt
python manage.py migrate
The scraping process is divided into 3 steps. Before the steps, you should run the screen
command, and press “Continue”. Also, make sure you run screen
after each reboot.
The scripts are numbered to gather and prettify information.
01_scrape_urls.py
script iterates each letter and collects URLs that will be requested later02_scrape_item_data.py
script makes a request to each collected URL and the required contents into the database.If you need to come back to the shell
l, you can easily leave the screen
util by pressing Ctrl + A then D
The process will not be interrupted, because it runs at an isolated view
When you wanna return back to the script, just type screen -r
which will restore your view
1/ We need to search by letter + , eg a - https://bidtoart.com/advanced-search?filter=art&title=a%2a 2/ Drill into each page where there is pricing info, estimate would be both the start_price and end_price:
https://bidtoart.com/art/abraham-hulk-shipping-in-a-calm-34
Will update the mapping file in the meantime