In https://github.com/opencivicdata/python-legistar-scraper/pull/47, we assumed that current relations would share the highest flag value. Per https://github.com/datamade/la-metro-councilmatic/issues/669#issuecomment-747510304, this is not the case. This PR updates the relations method to return a deduplicated list of relations use the most recent version of each relation, rather than a deduplicated list of relations sharing the max value of the relation flag across the entire set. It also exposes a method that can be overridden in downstream scraper instances to customize how, if at all, relations should be filtered during a scrape.
We aren't 100% sure how the relation flag value is set (Metro is looking into it), but we do know that it isn't necessarily meaningful across all relations, only within versions of the same related bill.
Testing instructions
Navigate into your local scraper instance and install an editable version of this branch into your virtual environment: pip install -e /path/to/python-legistar-scraper
Scrape two classes of Metro matter relations: pupa update lametro --scrape bills matter_ids=4455,6008
View the scraped data in _data/lametro/bill* and confirm that the related_bills array in both files does not contain duplicates but does contain a relation object for each distinct bill in the API call.
Description
In https://github.com/opencivicdata/python-legistar-scraper/pull/47, we assumed that current relations would share the highest flag value. Per https://github.com/datamade/la-metro-councilmatic/issues/669#issuecomment-747510304, this is not the case. This PR updates the
relations
method to return a deduplicated list of relations use the most recent version of each relation, rather than a deduplicated list of relations sharing the max value of the relation flag across the entire set. It also exposes a method that can be overridden in downstream scraper instances to customize how, if at all, relations should be filtered during a scrape.Connects https://github.com/datamade/la-metro-councilmatic/issues/669.
Notes
We aren't 100% sure how the relation flag value is set (Metro is looking into it), but we do know that it isn't necessarily meaningful across all relations, only within versions of the same related bill.
Testing instructions
pip install -e /path/to/python-legistar-scraper
pupa update lametro --scrape bills matter_ids=4455,6008
4455
contains duplicate relations: http://webapi.legistar.com/v1/metro/matters/4455/relations6008
contains relations that do not share a common max version: http://webapi.legistar.com/v1/metro/matters/6008/relations_data/lametro/bill*
and confirm that therelated_bills
array in both files does not contain duplicates but does contain a relation object for each distinct bill in the API call.