vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
342 stars 84 forks source link

update world_bank scraper #349

Open cjyetman opened 11 months ago

cjyetman commented 11 months ago

Caution: causes many changes/moves in the data CSV

etiennebacher commented 11 months ago

Hi, jumping in here to point out that most of the changes are due to the fact that the dataset is now sorted by wb and not by country anymore. Isn't sorting by country again an easy way to reduce the number of changes here? Sorry if I missed sth obvious about this

cjyetman commented 11 months ago

Hi, jumping in here to point out that most of the changes are due to the fact that the dataset is now sorted by wb and not by country anymore. Isn't sorting by country again an easy way to reduce the number of changes here? Sorry if I missed sth obvious about this

Yes, but my goal here was to update the scraper with minimal changes, not to update the data (which is typically done all together in a separate process). Just pointing it out for @vincentarelbundock's awareness.

NilsEnevoldsen commented 11 months ago

It does make it hard to see what, if anything, changed as a result.

vincentarelbundock commented 11 months ago

Thanks all. I added one line of code with an arrange to make the diff easier to see. The changes look pretty minimal.

You can merge whenever you want.

cjyetman commented 11 months ago

@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?

vincentarelbundock commented 11 months ago

@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?

Yeah, it's been a while, but if I remember correctly, my previous setup was ridiculously over-(and badly-) engineered. So I simplified everything. get_*() saves a CSV file that we keep in the repo. Then, build.R reads and merges all the CSV files.

I always run it on my local machine; never on Docker. Maybe I should do that, but things have worked ok thus far...