kanvas-ai / artindex

Art Index
GNU General Public License v3.0
0 stars 0 forks source link

Bidtoart.com scraper #19

Open kaljuvee opened 1 year ago

kaljuvee commented 1 year ago

1/ We need to search by letter + , eg a - https://bidtoart.com/advanced-search?filter=art&title=a%2a 2/ Drill into each page where there is pricing info, estimate would be both the start_price and end_price:

https://bidtoart.com/art/abraham-hulk-shipping-in-a-calm-34

Will update the mapping file in the meantime

owlab-developer commented 1 year ago

20-24h for development (we can scrape all the alphabet because we already have the algorithm left from findartinfo.com). But the problem is that the bot will scraper the data for a week I guess. Only "a*" has result of 395k arts

kaljuvee commented 1 year ago

Approved

owlab-developer commented 1 year ago

https://github.com/kanvas-ai/artindex/tree/main/scrapers/bidtoart

After you pull the code from the repo do these actions once:

  1. python -m venv venv- to create a virtual environment
  2. source venv/bin/activate
  3. pip install -r requirements.txt
  4. python manage.py migrate

The scraping process is divided into 3 steps. Before the steps, you should run the screen command, and press “Continue”. Also, make sure you run screen after each reboot. The scripts are numbered to gather and prettify information.

  1. 01_scrape_urls.py script iterates each letter and collects URLs that will be requested later
  2. 02_scrape_item_data.py script makes a request to each collected URL and the required contents into the database.

If you need to come back to the shell l, you can easily leave the screen util by pressing Ctrl + A then D The process will not be interrupted, because it runs at an isolated view When you wanna return back to the script, just type screen -r which will restore your view