zhou-en / pyppeteer-scraper

Web scraper using pyppeteer
0 stars 0 forks source link

pyppeteer-scraper

Web scraper using pyppeteer

Diagram

alt text

References

Environment Variables

Proxy (optional)

Slack

Deployment on Raspberry Pi

Use Playwright as scraper

pip install playwright
playwright install

OSError: [Errno 8] Exec format error: '/home/pi/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome

If a Chromium browser is installed already:

If no Chromium is not installed, install it with the follow command and repeat above step: sudo apt install chromium -y

Cron Jobs

Raspberry Pi

# run every hour between 7 and 22 o'clock
0 7-22 * * * cd /home/pi/Projects/pyppeteer-scraper; venv/bin/python scraper/home_depo.py > /tmp/stdout.log 2> /tmp/stderr.log

# run every hour between 7 - 22 o'clock
0 7-22 * * * cd /home/pi/Projects/pyppeteer-scraper; venv/bin/python scraper/library_event.py > /tmp/stdout.log 2> /tmp/stderr.log

# run at 9am everyday
0 9 * * * cd /home/pi/Projects/pyppeteer-scraper; venv/bin/python scraper/stonebridge_event.py > /tmp/stdout.log 2> /tmp/stderr.log

# clean up logs once a week At 00:00 on Sunday
0 0 * * 0 cd /home/pi/Projects/pyppeteer-scraper; venv/bin/python logger/cleanup.py > /tmp/stdout.log 2> /tmp/stderr.log

Macbook

# run every hour between 9:00 and 16:00 from Monday to Friday
0 9-16 * * 1-5 cd /Users/enzhou/Projects/pyppeteer-scraper && /Users/enzhou/anaconda3/envs/pyppeteer/bin/python scraper/home_depo.py > /tmp/stdout.log 2> /tmp/stderr.log