Closed eduardocorrearaujo closed 2 years ago
The module get_wbd_data.py
should be renamed to worldbank.py
Always remember that names of variables, modules and packages need to be meaningful and readable. https://peps.python.org/pep-0008/#naming-conventions
I liked the function, it is good to go in my opinion. My only concern is that there are 20k+ indicators in the WB database. Are you planning on making one function for each indicator? Maybe a more feasible idea would be download the dataset and leave the manipulation to the user. I tried to address this last solution in the R script (it isn't a function yet, just a demo of what can be done).
Thank you for the feedback @dcpcamara. I understand your point.
In the wbgapi
package, a function allows the user to search for keywords in the indicators and return the matched values using the function wb.series.info()
. There is also a function that returns a data frame for a list of indicators and countries.
I can create a function that returns something similar, but I thought it would be interesting to have functions that allow us to get some useful indicators easily. These functions would be focused in the more valuable indicators for our projects. For this reason, the first indicator that I thought about was the population.
Regarding the large number of indicators, I agree with @eduardocorrearaujo 's approach to have some ready to use function for Commonly used indicators, but as @dcpcamara says we also need a generic implementation. in GHOclient I let the user search for the indicator by keyword, and then use a generic function thta returns the data corresponding to a specific indicator code.
Regarding the large number of indicators, I agree with @eduardocorrearaujo 's approach to have some ready to use function for Commonly used indicators, but as @dcpcamara says we also need a generic implementation. in GHOclient I let the user search for the indicator by keyword, and then use a generic function thta returns the data corresponding to a specific indicator code.
Ok, I will make a generic implementation, and I'll share it with you.
Hi, I added some functions to search indicators in the database and get the data to specific indicators. I also wrote an example of a documentation file for the functions that I created. I know that the documentation should be in another folder, but how Flavio created that folder after I open this PR, to have access to this folder I would need to open another PR. For this reason, to keep these similar files in the same PR, to be analyzed together, I preferred to put the documentation file here, for all take a look, and after approving the .py and the .md file, I will remove the .md file and I will make the merge and create a new PR to put the documentation in the right folder.
@eduardocorrearaujo if you are going to use examples from a Jupyter notebook, you can use the notebook file itself as the documentation, merging the text into the notebook as Markdown cells. You can also create hierarchies in the documentation like shown below
index.rst
:
Index
=====
.. toctree::
data/index
data/index.rst
Data collection tools
======================
.. toctree::
:maxdepth: 2
:caption: Contents:
World Bank<worldbank.ipynb>
Google Trends<trends.ipynb>
the .ipyb_checkpoints
folder should be added to the .gitignore
Thank you so much for the tips, @fccoelho. Now the code is much better.
About the creation of the unit tests, I have some doubts. I should create a function that every time is run select a random indicator, database, function, and country and so confirm that the function is working?
Thank you so much for the tips, @fccoelho. Now the code is much better.
About the creation of the unit tests, I have some doubts. I should create a function that every time is run select a random indicator, database, function, and country and so confirm that the function is working?
The is a nice functionality of pytest that allows us to pass different parameter values to a single unit test that broadens its hability to catch errors. Here is an example in this project.
But often this is not needed if we can write a simple test case that represents a large variety of situations.
I think it can be merged, but given the CI fail, I think wbgapi
needs to be added with poetry add wbgapi
.
@eduardocorrearaujo it would be nice to have some test coverage. Try to add at least a couple of tests and we can increase the coverage later.
I created this PR to solve the issue #149.
I found a python package called
wbgapi
(https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data) that possibility the user to get the data from the world data bank.Using the
wbgapi
package I created a function calledget_pop_data
to get the population data from the world data bank. I would like to know if that is the kind of function that you want to have in the epigraphhub_py package. If you have any other suggestions of datasets from the world data bank to implement a function let me know.