Closed lizgzil closed 4 years ago
I think there's a naming convention issue there as some of the spiders were copied from each other, the values IS applicable to all spiders, just it was named who when it was first implemented and not updated to a more generic name.
It's intent is only for the test policy to limit the test to a specific number of documents to allow us to run the DAG in minimal time and validate that it all runs correctly. For the full policy DAG there shouldn't be any limits set (i.e. it should be None, None).
After chatting with @jdu it seems that WHO_IRIS_YEARS isn't used for just scraping data from a specific date range, but instead filtering query params into the WHO site.
I think lines 33-37 here are the ones where it is used. @SamDepardieu could you confirm this?
In this case I think this issue will just be to change the doc string for create_dag_all_match
in airflow/dags/policy.py
which is currently ("spider_years: years to scrape")
(in
create_dag_fuzzy_match
ofairflow/dags/policy.py
)I have a few issues with this parameter:
spider_operator.py
it seems to only effect the years scraped from WHO? So does it need a clearer name if so?Please let me know if I've misunderstood something about this.