johndpjr / AgTern

19 stars 5 forks source link

Validate company scraping configurations #149

Closed johndpjr closed 10 months ago

johndpjr commented 11 months ago

Context

All of our scraping configurations are defined in the data/companies folder in JSON files. Each company has its own scraping config (which we have defined here). However, when we run the program, we need to validate that the company scraping config is valid before scraping begins. One such data validation library is called Cerberus, which is lightweight and easily configurable. The idea is that Cerberus will be able to validate the syntax/structure of the scraping config JSON so that we are guaranteed to be running a valid scrape (see their usage docs for more info on how it works).

TODO

Notes

Some companies have a scrape: null key-value mapping; this means that we have not defined a scraping config for that company's job board yet, but are planning to. The validator should allow this.