unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Add assertions for empty selector results #158

Closed spulec closed 9 years ago

spulec commented 10 years ago

One common scraper pattern is

    results = doc.select("table p > a")
    for result in results:
        ...

Unfortunately, if the webpage changes and doc.select returns an empty list we have no way of knowing that the scraper is now broken. The correct solution is probably to go back and change to something like

    results = doc.select("table p > a")
    if not results:
        raise AssertionError("No report links found for %s" % url)
    for result in results:
        ...
divergentdave commented 9 years ago

Closed by #210