tgherzog / wbgapi

Python module that makes using the World Bank's API a lot easier and more intuitive.
MIT License
140 stars 31 forks source link

Disabled automatic alphabetical sorting for multiple countries queries #36

Closed andrepiz closed 2 months ago

andrepiz commented 2 months ago

Default behaviour was to sort alphabetically the rows of the output dataframe. Now the order is kept the same of the input countries list. A sortRows optional parameter is added to control this aspect (default to False).

BEFORE:
wb.data.DataFrame(db=2, series='NY.GDP.MKTP.CD', economy=['ITA','CHN','RUS'], time=range(2020, 2023)) returns CHN ... ... ITA ... ... RUS ... ...

NOW: wb.data.DataFrame(db=2, series='NY.GDP.MKTP.CD', economy=['ITA','CHN','RUS'], time=range(2020, 2023)) returns ITA ... ... CHN ... ... RUS ... ...

tgherzog commented 2 months ago

@andrepiz thanks for the pull request. Very interesting suggestion. However, I've decided not to merge it because it relies on a somewhat capricious and certainly undocumented "feature" in the API, namely that dimensional elements are returned from the API in the reverse order that they are requested. That is, except for the first dimension! That introduces lots of potential confusion if, for example, the Bank changes their API without telling anyone (which definitely happens) or if users request MultiIndex dataframes. So I think it's best to continue sorting results so they appear in a predictable order. DataFrames can be easily sorted after the fact, so it's not too much trouble for the user to change the sort if they wish

Here's an exanple of what I'm talking about: these two URLs should intuitively produce the same API response, but the first orders countries from right to left, and the second from left to right (at least at the time of this writing).

https://api.worldbank.org/v2/en/sources/2/series/NY.GDP.MKTP.CD/country/ITA;RUS;USA;CHN/time/YR2022
https://api.worldbank.org/v2/en/sources/2/country/ITA;RUS;USA;CHN/time/YR2022/series/NY.GDP.MKTP.CD

There are some other potential points of confusion if leaving results unsorted; for instance, if the user requests data for countries in a region (region sets are not in any discernible order):

wb.data.DataFrame('SP.POP.TOTL', wb.region.members('MEA'), 2022)