seanpianka / Zipcodes

A simple library for querying U.S. zipcodes.
MIT License
78 stars 15 forks source link

Filtering by acceptable cities, you have to provide an exact match #8

Closed samcan closed 4 years ago

samcan commented 4 years ago

Hi,

I'm trying to use zipcodes.filter_by and running into an issue. I'm trying to filter by acceptable cities to find matches. This is to catch cases where suburbs are listed under a larger metro area.

An example is the zip code 97222. It could be listed as either Milwaukie, Oak Grove, or Portland in our data:

> zips=zipcodes.filter_by(active=True, zip_code='97222')
> pprint.pprint(zips)
[{'acceptable_cities': ['Milwaukie', 'Oak Grove'],
  'active': True,
  'area_codes': ['971'],
  'city': 'Portland',
  'country': 'US',
  'county': 'Clackamas County',
  'lat': '45.4422',
  'long': '-122.6186',
  'state': 'OR',
  'timezone': 'America/Los_Angeles',
  'unacceptable_cities': [],
  'world_region': 'NA',
  'zip_code': '97222',
  'zip_code_type': 'STANDARD'}]

But if I try to filter by acceptable_cities to find Milwaukie, I get no results:

> zips=zipcodes.filter_by(active=True, acceptable_cities='Milwaukie')
> pprint.pprint(zips)
[]

Instead, I have to filter by ALL the acceptable_cities:

> zips=zipcodes.filter_by(active=True, acceptable_cities=['Milwaukie', 'Oak Grove'])
> pprint.pprint(zips)
[{'acceptable_cities': ['Milwaukie', 'Oak Grove'],
  'active': True,
  'area_codes': ['971'],
  'city': 'Portland',
  'country': 'US',
  'county': 'Clackamas County',
  'lat': '45.4422',
  'long': '-122.6186',
  'state': 'OR',
  'timezone': 'America/Los_Angeles',
  'unacceptable_cities': [],
  'world_region': 'NA',
  'zip_code': '97222',
  'zip_code_type': 'STANDARD'},
 {'acceptable_cities': ['Milwaukie', 'Oak Grove'],
  'active': True,
  'area_codes': ['503', '971'],
  'city': 'Portland',
  'country': 'US',
  'county': 'Clackamas County',
  'lat': '45.4018',
  'long': '-122.6146',
  'state': 'OR',
  'timezone': 'America/Los_Angeles',
  'unacceptable_cities': ['Jennings Lodge', 'Johnson City', 'Oak Lodge'],
  'world_region': 'NA',
  'zip_code': '97267',
  'zip_code_type': 'STANDARD'}]

Is there a better way to do matching that I'm missing? I couldn't find an example in the documentation which covered this case.

I could do this:

> zips = zipcodes.filter_by(active=True, state='OR')
> zips_filtered =[x for x in zips if 'Milwaukie' in x['acceptable_cities']] 
> pprint.pprint(zips_filtered)
[{'acceptable_cities': ['Milwaukie', 'Oak Grove'],
  'active': True,
  'area_codes': ['971'],
  'city': 'Portland',
  'country': 'US',
  'county': 'Clackamas County',
  'lat': '45.4422',
  'long': '-122.6186',
  'state': 'OR',
  'timezone': 'America/Los_Angeles',
  'unacceptable_cities': [],
  'world_region': 'NA',
  'zip_code': '97222',
  'zip_code_type': 'STANDARD'},
 {'acceptable_cities': ['Milwaukie', 'Oak Grove'],
  'active': True,
  'area_codes': ['503', '971'],
  'city': 'Portland',
  'country': 'US',
  'county': 'Clackamas County',
  'lat': '45.4018',
  'long': '-122.6146',
  'state': 'OR',
  'timezone': 'America/Los_Angeles',
  'unacceptable_cities': ['Jennings Lodge', 'Johnson City', 'Oak Lodge'],
  'world_region': 'NA',
  'zip_code': '97267',
  'zip_code_type': 'STANDARD'},
 {'acceptable_cities': ['Milwaukie'],
  'active': True,
  'area_codes': ['971'],
  'city': 'Portland',
  'country': 'US',
  'county': 'Multnomah County',
  'lat': '45.4465',
  'long': '-122.6382',
  'state': 'OR',
  'timezone': 'America/Los_Angeles',
  'unacceptable_cities': [],
  'world_region': 'NA',
  'zip_code': '97269',
  'zip_code_type': 'PO BOX'}]

Which seems to work... Is there a more efficient way of doing this?

seanpianka commented 4 years ago

[x for x in zips if 'Milwaukie' in x['acceptable_cities']] is an idiomatic Python solution. If you look at the code in zipcodes/init.py, it largely consists of list and dictionary comprehensions, along with similar filtering.

If I understand your issue correctly, you are proposing to add the following logic to the filter_by function:

If one of the filter arguments specified to filter_by is an element ("x"), but the corresponding data attribute ("y") to be filtered is a collection type, then filter_by should return all zipcodes such that y contains x.

While this feature seems potentially useful to users with a similar use-case, you were able to solve this problem with a one-line comprehension expression. I think your solution is perfectly acceptable, and others can use comprehension expressions for what they do best.

However, I may add a documentation example of your use-case, as there are likely others who have run into similar roadblocks.

Thanks for your post, @samcan.