Closed pigden closed 2 years ago
Thank you! Adding this via a feature flag seems fine to me. I'd prefer a "case_sensitive=True" option, because I find it easier to understand than double negation, i.e. keeping the current behavior as a default.
Thanks for getting back to me. I have raised a pull request for this change.
I just released version 1.3.0 which includes your contribution. Thank you! I also upgraded the dependencies, which are not compatible with Python 2.7 any more. Hope this doesn't cause you any trouble. I kept the check for the casefold
attribute, but want to remove code for supporting Python 2 in the next release.
Great, thank you. Nope doesn’t impact me as I am using python 3, just wanted to keep 2 support as the rest of the lib did.
The code I have done can be simplified during/after the next release :)
I'm a bit late here, but I'm poking around due to the 1.3.0 release. First, thanks for the effort with this project. I've used it for years and appreciate your ongoing maintenance.
Second, why make this an option? Is there a time when you wouldn't want a city to match because of a case difference? Why not just make case insensitive the default and not even provide a way of doing case sensitive searching? Just a thought. I also noticed that this wasn't documented in the readme. Might be good to put it in there if the option is here to stay.
Thanks for the note. I will update the readme with the new feature flag on the listed functions. On the making this feature the default, I will let @yaph comment on making a behaviour change to the lib.
P.s. if the decision is made to change the behaviour of this function. I would like to take the opportunity to bring consistency between alternatenames (list) and name (str) search on this function. At the moment str attributes are partial match success but list attributes are exact match search.
That's a valid point @mlissner. An argument for keeping the case sensitive search is that it should be faster. I haven't measured the difference though. It probably makes sense to make False
the default for case_sensitive
though. This would be fine for me in a new major release, i.e. 2.0.
That being said, I was considering changing the API of this package quite drastically in the next major release. Instead of creating a GeonamesCache
object, I'd rather do something like from geonamescache import countries
and get rid of the GeonamesCache
class altogether. The reason I implemented it like that, was to make sure that the data files are only loaded once during the lifecycle of on object and only when needed.
I'd like to hear your thoughts as users of the package on that and thanks to both of you for using and contributing to it!
Yeah, I thought it might be for performance reasons. I'm sure case insensitive is a bit slower, but if you make the data objects on disk lowercase, doing someparam.lower()
has gotta be practically instant (of course, doing this means you wouldn't be able to do case-sensitive queries anymore — maybe that's fine?).
(If I sound like I care a lot about this, I don't. Just surprised me that it wasn't insensitive by default.)
For changes to the API, yes, that's certainly be better. I think we do something like that in a package I maintain called eyecite
. It has some code to do the kind of one-time loading you're talking about. I didn't develop that code, but I recall it being trickier than I expected. I can dig it up if you want.
I think by importing a function like so from geonamescache import get_countries
, the @functools.lru_cache
decorator could be used, but importing the countries
object directly would be nicer. I'd certainly be interested to see how this is done in the eyecite
project. Thank you!
I did some digging. A dependency for eyecite is a database of courts, which loads a json file from disk. Here's the PR that makes that get lazy-loaded and cached: https://github.com/freelawproject/courts-db/pull/16
One thing to notice is that the lazy loading is important so import xyz
doesn't slow down starting things like the django shell.
Awesome, thank you!
An example city of "Stoke-on-Trent" has multiple combinations of how this could be written, as the -on- is lower we can't use str.title() and there are other examples of joining words. One way this could be handled in the client is to lower every word apart from the first and last separated by the -. However it is probably easier for a case insensitive search.
If you agree with this change, I will send a pull request to add this. If you are happy for me to add this, would you like a feature flag like case_insensitive=False or be the default?