Adding the ability to search_cities with case insensitive search.

yaph / geonamescache

geonamescache - a Python library for quick access to a subset of GeoNames data.

https://pypi.org/project/geonamescache/

MIT License

99 stars 38 forks source link

Adding the ability to search_cities with case insensitive search. #26

Closed pigden closed 2 years ago

pigden commented 2 years ago

An example city of "Stoke-on-Trent" has multiple combinations of how this could be written, as the -on- is lower we can't use str.title() and there are other examples of joining words. One way this could be handled in the client is to lower every word apart from the first and last separated by the -. However it is probably easier for a case insensitive search.

If you agree with this change, I will send a pull request to add this. If you are happy for me to add this, would you like a feature flag like case_insensitive=False or be the default?

yaph commented 2 years ago

Thank you! Adding this via a feature flag seems fine to me. I'd prefer a "case_sensitive=True" option, because I find it easier to understand than double negation, i.e. keeping the current behavior as a default.

pigden commented 2 years ago

Thanks for getting back to me. I have raised a pull request for this change.

yaph commented 2 years ago

I just released version 1.3.0 which includes your contribution. Thank you! I also upgraded the dependencies, which are not compatible with Python 2.7 any more. Hope this doesn't cause you any trouble. I kept the check for the casefold attribute, but want to remove code for supporting Python 2 in the next release.

pigden commented 2 years ago

Great, thank you. Nope doesn’t impact me as I am using python 3, just wanted to keep 2 support as the rest of the lib did.

pigden commented 2 years ago

The code I have done can be simplified during/after the next release :)

mlissner commented 2 years ago

I'm a bit late here, but I'm poking around due to the 1.3.0 release. First, thanks for the effort with this project. I've used it for years and appreciate your ongoing maintenance.

Second, why make this an option? Is there a time when you wouldn't want a city to match because of a case difference? Why not just make case insensitive the default and not even provide a way of doing case sensitive searching? Just a thought. I also noticed that this wasn't documented in the readme. Might be good to put it in there if the option is here to stay.

pigden commented 2 years ago

Thanks for the note. I will update the readme with the new feature flag on the listed functions. On the making this feature the default, I will let @yaph comment on making a behaviour change to the lib.

P.s. if the decision is made to change the behaviour of this function. I would like to take the opportunity to bring consistency between alternatenames (list) and name (str) search on this function. At the moment str attributes are partial match success but list attributes are exact match search.

yaph commented 2 years ago

That's a valid point @mlissner. An argument for keeping the case sensitive search is that it should be faster. I haven't measured the difference though. It probably makes sense to make False the default for case_sensitive though. This would be fine for me in a new major release, i.e. 2.0.

That being said, I was considering changing the API of this package quite drastically in the next major release. Instead of creating a GeonamesCache object, I'd rather do something like from geonamescache import countries and get rid of the GeonamesCache class altogether. The reason I implemented it like that, was to make sure that the data files are only loaded once during the lifecycle of on object and only when needed.

I'd like to hear your thoughts as users of the package on that and thanks to both of you for using and contributing to it!

mlissner commented 2 years ago

Yeah, I thought it might be for performance reasons. I'm sure case insensitive is a bit slower, but if you make the data objects on disk lowercase, doing someparam.lower() has gotta be practically instant (of course, doing this means you wouldn't be able to do case-sensitive queries anymore — maybe that's fine?).

(If I sound like I care a lot about this, I don't. Just surprised me that it wasn't insensitive by default.)

For changes to the API, yes, that's certainly be better. I think we do something like that in a package I maintain called eyecite. It has some code to do the kind of one-time loading you're talking about. I didn't develop that code, but I recall it being trickier than I expected. I can dig it up if you want.

yaph commented 2 years ago

I think by importing a function like so from geonamescache import get_countries, the @functools.lru_cache decorator could be used, but importing the countries object directly would be nicer. I'd certainly be interested to see how this is done in the eyecite project. Thank you!

mlissner commented 2 years ago

I did some digging. A dependency for eyecite is a database of courts, which loads a json file from disk. Here's the PR that makes that get lazy-loaded and cached: https://github.com/freelawproject/courts-db/pull/16

mlissner commented 2 years ago

One thing to notice is that the lazy loading is important so import xyz doesn't slow down starting things like the django shell.

yaph commented 2 years ago

Awesome, thank you!