Closed frank113 closed 11 months ago
That's a very good point!
A bit on my personal background: I basically quit doing statistical analysis in Python 10 years ago, mainly because I disliked pandas so much. The reason I'm doing these porting projects now is that I wanted an excuse to learn polars
(which looks really great thus far).
All this to say that I would be very hesitant to re-write the internals using pandas
. However, documenting as you describe in option 1, and adding data dependencies as you describe in Option 3 seem like excellent (and easy) ideas.
If possible I'd opt to not have a hard dependency on anything. The functionality of this package isn't so complex it really needs dependencies like polars or pandas. For my use cases it would almost certainly be better to use plain python dicts, because those are darn fast for lookups and don't have the overhead of moving into C or rust. But having convenience functions to get a pandas or polars dataframe would be nice for people more comfortable with that. So my votes would also be for option 1 and 3, leaving pandas optional.
@towr I agree.
I am overloaded at work right now, so it will probably take a while before I have time to make this change.
I would be extremely happy to review a PR if anybody has the energy to make Pandas and Polars optional.
(If optional, it would be nice if the behaviour felt natural: the function returns the same kind of object as the input automatically.)
Thanks again for raising this issue.
In version 0.4.0 (on pypi now), the pandas and polars dependencies are optional.
Summary
One of the most compelling features of the R package
countrycode
is the ability to manually manipulate thecodelist
Dataframe for use in other projects. As presently constructed thecodelist
variable exported fromcountrycode
does not integrate withpandas
ornumpy
. More specifically the following script will fail whencountrycode
is installed viapip
:Potential Fixes
numpy
andpandas
functionality requires installation of those packagespyproject.toml
to includepandas
andnumpy
.data
dependencies section that includesnumpy
andpandas
. With this approach a user can optionally install the additional packages:codelist
to apandas.DataFrame
and remove substitutepolars
forpandas
Of the potential options I am partial to 3 and 4. Option 3 leaves the structure of the package untouched and shifts the choice to install additional dependencies to the user if they wish to use the
codelist
data in apandas
ornumpy
environment. My predilection for option 4 stems from python's reliance onpandas
to manipulate data.Considerations
Other
Your statement in the last issue about "real work" resonated with me.