ropensci / comtradr

Functions for Interacting with the UN Comtrade API
https://docs.ropensci.org/comtradr
64 stars 17 forks source link

Planned re-launch of comtradr #45

Closed datapumpernickel closed 8 months ago

datapumpernickel commented 1 year ago

Hi all!

the Comtrade API has been undergoing extensive changes. To accomodate these changes, comtradr will undergo a relaunch. Given that the most important functions are not working at the moment, this is a sensible moment to deprecate some and also update some of the used packages, e.g. from httr to httr2. Sadly, Chris will also leave the package as a maintainer, meaning there will also be a transition of maintainership in parallel.

To have an overview of needed changes, I will try to make a summary and put out some ideas that I would then propose to discuss in separate issues, so we can distribute efficiently. Please let me know, if I forgot something important (surely I did...).

Pinging everyone who wanted to help or is already a co-maintainer: @hgoers @dkolkin @pachadotdev

Authentication

Currently, the ct_register_token() function registers the token and does some extensive validation, depending whether there is a token and whether it is premium. The new API always requires a token. I propose that for now, we focus on only the options for the free API, as I personally do not have access to the premium API and it is quite expensive. I assume most users will not have access anyways. If somebody has access to the premium feature we can maybe extend for this option later?

I think we can probably just use two simple functions proposed in the httr2 package for setting and getting the token that are relatively user friendly, e.g. like here: https://github.com/datapumpernickel/untrader/blob/main/R/utils.R

If we have time, we can think about secrets and safer ways to store the credentials, but I think this suffices for now. I think there is no way to get nicely formatted data about how many calls are left on a given day of the 250 free calls, we could think about implementing something for that, but I would keep that for a future extension as well.

Getting data

Currently, most things happen in the ct_search() function, that is both checking the input parameters as well as building the request. The execution of the API request is already sourced out to another function, but this function also does some of the processing of the response. My proposal would be to further modularize this function to make it easier to trouble-shoot, make it easy to extend specific parts of it (e.g. adding new parameters) and work on it in parallel without conflicts.

The structure I started in another package for testing this out is the following:

ct_get_data()    
│
└───ct_params_check()
│   │    This function would call all subsequent functions in the relevant hierarchy and passes arguments between them (e.g. check 
|   |     classification code and url parameters before commodity code and pass it to cmd_code_check)
│   │
│   └─── ct_class_code_check()
|   |     The following functions should check the input parameters for validity and also translate them from human readable to 
|   |     to the machine-readable codes of the comtrade API
│   └─── ct_freq_check()
│   │  
│   └─── ct_cmd_code_check()
│   │
│   └─── ct_flow_direction_check()
│   │
│   └─── ct_reporter_check()
│   │
│   └─── ct_partner_check()
│   │
│   └─── >extendable for all parameters that we want to include or that are added in the future
|
└─── ct_request_build()
|   this builds the URL with the httr2 library
|
└───ct_request_perform()
|    this performs the request and catches and returns possible http errors, also easy implementation of retries and throttling with httr2
|
└─── ct_request_process()
    this will parse the response to an r data.frame, could also include 'pretty_cols = T' argument to allow for further processing

You can see it partially implemented in this example here: https://github.com/datapumpernickel/untrader/blob/main/R/check_params.R

I think this would make it very easy for us to extend the API in the future and at the same time make it easy to pinpoint errors to specific parameters. Lastly, it would probably make it also easier to write testthat statements for these functions. I am in no way set on any function names, these are just put as an example.

The previous, then unused functions we should keep in one file (e.g. legacy.R) and stick some "Deprecated" Error messages in there that help users transition to the new functions. See #44 for some hints and ideas by Maëlle on this.

Package Data

Getting and saving the data

When verifying the input parameters we will need to access the predefined tables of the UN. There is a few functions currently implemented for saving and querying these, I guess we should see how much of it we can salvage and what could benefit from a re-write. The full list of reference tables can be found here: https://comtradeapi.un.org/files/v1/app/reference/ListofReferences.json It would probably make sense to write a function that queries all of these in some smart fashion and saves them in extdata. I think it would be good to make these available to the user as well and not keep them internal, at least I found myself frequently looking at them. I would need to familiarize myself with the best practices around sysdata/extdata and the other directories

Updating the reference tables on the user side

Currently there is a package function that updates the tables for the current session when users execute it ct_update_databases. I am not sure what are the specific guidelines of rOpenSci and CRAN for saving data like this temporarily and permanently, maybe it would make sense to do some research into how this could be solved best and then adjust to them. I assume @Chris put some thought into this and we might be able to recycle large parts of this function.

Look-up functions

The package currently provides for two look-up functions ct_country_lookup.R and ct_commodity_lookup.R to make it easier to query the API in natural language.

1) For country codes:

2) For commodity codes:

Testing

We will have to write new tests for our new suite of functions. After reading some of the review for the package from last time, I suggest we write a lot of testthat statements with few arguments to easily pinpoint what is wrong with each test. Furthermore, Maëlle suggested we implement testing for http-calls (see #44). I think this is a great idea, because it would allow us to check for the proper building of the http-request without passing one of our API tokens to github and figuring out how to make this pass the check for submitting to CRAN without an API token.

Badges

There is quite a few badges that are currently not working anymore or are failing. I honestly have no clue about most of these. My naive suggestions would be to archive these for now, together with their respective yaml files and implement them again when we have gotten further with the new functions.

@ChrisMuir It would be great to get some guidance from you on this. Maybe we can make a slack-group with the new co-maintainers to discuss this internally.

Travis: I get an error message about Project not found or access denied when following the link: https://ci.appveyor.com/project/ropensci/comtradr

Codecov: Same here, something seems to have been archived, I am not sure what would be necessary to get it running again: https://app.codecov.io/gh/ropensci/comtradr

CRAN: obviously currently down, since we are off CRAN for now

Peer-reviewed: Although the package has been peer-reviewed, after the comprehensive re-launch it probably should not claim this badge until a re-review. Is there any policy on this @maelle?

Really looking forward to working with all of you on this!

ChrisMuir commented 1 year ago

Hello, on the badges in the README, none of those are actually essential. If the package stays within the rOpenSci GH org, I assume they will want to keep that badge on the repo. Honestly, I think this was my first time adding badges to any repo I built, so I added a bunch because I thought they looked cool :) But, the CI/CD integrations do provide some value. I found those useful because I was developing on a Mac locally, so seeing that new updates I was pushing would successfully build and install on osx/linux/windows was helpful.

Travis: This was set up to build and install comtradr on osx and linux. You'll need to re-authenticate the package on the Travis platform to get this working again.

AppVeyor: This was set up to build and install comtradr on Windows (I think?). Same with Travis-CI, you'll need to re-authenticate the package to get this working again.

CodeCov: This just shows the percentage of source code that's covered by unit tests. Again, comtradr needs to be reauthenticated.

CRAN: I think this can be left as-is. Since this package was on CRAN for years, I think it's beneficial to display the current CRAN status.

maelle commented 1 year ago

Regarding saving data,

maelle commented 1 year ago

Is there any policy on this @maelle?

Good question! https://github.com/ropensci/software-review-meta/issues/100

hgoers commented 1 year ago

I propose that for now, we focus on only the options for the free API.

Agree!

I think we can probably just use two simple functions proposed in the httr2 package for setting and getting the token that are relatively user friendly, e.g. like here: https://github.com/datapumpernickel/untrader/blob/main/R/utils.R

Agree!

My proposal would be to further modularize this function to make it easier to trouble-shoot, make it easy to extend specific parts of it (e.g. adding new parameters) and work on it in parallel without conflicts.

Agree! The current structure provided in untrader is great. There are some changes in syntax on the API end that need to be added. I have a live list of these. They are very easy to incorporate.

The full list of reference tables can be found here: https://comtradeapi.un.org/files/v1/app/reference/ListofReferences.json It would probably make sense to write a function that queries all of these in some smart fashion and saves them in extdata.

Agree! And I have existing functions that do this. I can add them in the dev branch and we can see how they run.

I think it would be good to make these available to the user as well and not keep them internal, at least I found myself frequently looking at them.

Also agree.

I would propose we abandon the writing out of country names, because it is very error prone.

Agree. We can add some instructions in a vignette that provide users with easy access to countries' ISO3 codes using countrycode.

datapumpernickel commented 1 year ago

Travis: This was set up to build and install comtradr on osx and linux. You'll need to re-authenticate the package on the Travis platform to get this working again.

AppVeyor: This was set up to build and install comtradr on Windows (I think?). Same with Travis-CI, you'll need to re-authenticate the package to get this working again.

CodeCov: This just shows the percentage of source code that's covered by unit tests. Again, comtradr needs to be reauthenticated.

Thanks @ChrisMuir for the input. We will look into re-authenticating them then once we have evolved to where the package should pass checks again! And yes, I do get the appeal of adding lots of badges. :P

CRAN: I think this can be left as-is. Since this package was on CRAN for years, I think it's beneficial to display the current CRAN status. Yes, agree!

datapumpernickel commented 8 months ago

Comtradr has been fully reviewed and submitted to CRAN. Let/s hope this works out, but closing this issue, as the relaunch in terms of functions itself is finished. See #68 for future development goals.