Closed LamAdr closed 12 months ago
Sorry I was not clear in explaining what was needed! I'm afraid I made you lose a bit of your time and effort...
It is not really useful to operate on different versions of codelist
per se. If the user wants to do that, they can just call pd.read_csv()
on our CSV directly and do whatever they want. It feels like introducing a new class and different attributes for different types of data frames is overkill and will just add code complexity for little tangible benefit.
I think that what we need is to modify the countrycode
function such that it:
replace_exact()
and replace_match()
functions. For example, here we use map_dict()
. That's a Polars method but we want to use a base Python function instead. Same hereThat way, the user gets the same kind of object that they supplied --- so no surprises. Also, there is only one "conversion engine" that we use internally, which keeps the pycountrycode
codebase simple.
Internally, maybe we want to read the codelist
CSV file into a dictionary where keys represent the codename and values are lists of values, all of equal length. That way, we can match codes by position.
I expect this to be a pretty small PR with relatively few lines of code changed.
Is that clearer?
Ok yeah I get it now. Sorry about that, I will make another PR.
@vincentarelbundock
I defined custom
GenericDataframe
objects containing adata
attributes which is a polars or pandas dataframe, or a python dict. Priority is to polars when available. Please tell me if you would prefer it not to be using objects.If you like it, here are two things I would appreciate your input on:
codelist
data directly, maybe it would be good to let the user decide which package to use for thedata
attribute when both are available?GenericDataframe
to uniformize testing. This means adding stuff unnecessary to usersThanks a lot