lsg551 / matricula-online-scraper

Scraper for Matricula Online
https://pypi.org/project/matricula-online-scraper/
MIT License
0 stars 0 forks source link

Cache locations data with a scraper inside github actions #30

Closed lsg551 closed 6 months ago

lsg551 commented 6 months ago

Description

Fetching all parishes (> 8000) usually takes scrapy a few minutes. Also, if many people do this regularly, it unnecessarily stresses Matricula's server. Especially since this data mostly remains untouched because it will only be regularly updated with new entries.

lsg551 commented 6 months ago

Article on how to do this: https://github.com/swyxio/gh-action-data-scraping?tab=readme-ov-file

lsg551 commented 6 months ago

Maybe, add

This adds another feature: tracking the number of newly added parishes. The file could look like

date       ,version ,files scraped
1714910198 ,v0.4.2  ,         8442

with