slovak-egov / CRZ-scraper

Web scraping and filtering code for slovak contract database - crz.gov.sk. The code downloads XML databases, creates a CSV database of contracts, filters them, downloads the files, extracts and cleans up tables with MD rates.
5 stars 2 forks source link

Code integration #1

Open mtihanyi opened 2 years ago

mtihanyi commented 2 years ago

The particular scripts are not integrated, i.e. they must be run in certain order, which is not necessarily determined by their numbering / naming convention. Furthermore, each script does or does not prompt user for input information depending on its purpose. Some prompts are repetitive, which will be eliminated. The aim is to create one single JSON file with all settings in one place and letting the whole code run at once.

sn3d commented 2 years ago

All these scripts reminds me kind of ETL scripts. And what you probably need is organise those scripts into some DAGs.

Did you consider some ETL platform? Something like Airflow(no please), Prefect, Luigi, Argo workflow.

Yes there are some heavy weight tools like AirFlow. Also tools they require k8s (Argo). But also some handy lightweight solutions like Luigi <- maybe this is for you.

It could solve also #5.

mtihanyi commented 2 years ago

Thank you for feedback. I will look into it.