symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
231 stars 57 forks source link

Proposal for lightweight version of pgeocode without pandas and numpy dependencies #81

Closed mwoss closed 4 months ago

mwoss commented 5 months ago

Hi @rth! :3 Thanks for your great work and for creating such a useful library. I deeply appreciate your hard work! :3

I love using it, but dependencies used underneath like pandas and numpy can be sometimes problematic. They are quite heavy and can be a bit troublesome to install on Apple devices with M* chips. I was thinking about an alternative that would provide an identical API, but use built-in functionalities/libraries such as csv or math instead.

I believe it could be beneficial for many workflows and projects. What do you think about such an idea? :D

rth commented 4 months ago

Thanks for your suggestion @mwoss ! Reducing the number of dependencies was certainly an objective when creating this package.

While it's possible to re-implement similar functionality with stdlib only, I think it would be slower and less robust. For instance, the basic operation is table lookup which will be much faster in pandas than naively scanning each row in pure Python. Also other possible features such as inverse geocoding https://github.com/symerio/pgeocode/issues/7 would certainly be slow without numeric libraries.

Feel free to re-implement something like this, but it will likely not happen as part of this project.

Note that installing pandas and numpy on apple M* cpu should pose no issues, unless you are using very outdated Python version or package version. There are binary wheels for this platform for some time now on PyPI.

Closing, but don't hesitate to comment more.

mwoss commented 4 months ago

Thanks for the asnwer, I completely understand your reasoning. Table lookup argument makes a lot of sense, it would be really hard to much up the speed of pandas probably :D I will try to reimplement pgeocode without external dependencies as a hobby project then :D