oec package - Githubissues

ropensci / software-review

rOpenSci Software Peer Review.

292 stars 104 forks source link

oec package #217

Closed pachadotdev closed 5 years ago

pachadotdev commented 6 years ago

Summary

What does this package do? (explain in 50 words or less):

oec provides an easy way to obtain data from the Observatory of Economic Complexity by accessing its API.

Paste the full DESCRIPTION file inside a code block below:

Package: oec
Type: Package
Title: Observatory of Economic Complexity API Wrapper and Utility Program
Version: 2.8.2
Date: 2018-06-09
Authors@R: c(
  person("Mauricio", "Vargas S.", role = c("aut", "cre", "cph"), email = "oec@media.mit.edu"),
  person("Manuel", "Aristaran", role = c("ctb")),
  person("Pablo", "Paladino", role = c("ctb")),
  person("Gabriela", "Perez", role = c("ctb")),
  person(family = "UN Comtrade", role = c("dtc")),
  person(family = "MIT Media Lab", role = c("dtc")),
  person(family = "Datawheel", role = c("fnd"))
  )
URL: https://CRAN.R-project.org/package=oec
BugReports: https://github.com/pachamaltese/oec-r/issues
Description: Access The Observatory of Economic Complexity API from R to download international trade data.
License: MIT + file LICENSE
LazyData: TRUE
Depends:
  R (>= 3.2)
Imports:
  rlang,
  magrittr,
  dplyr,
  stringr,
  curl,
  purrr,
  jsonlite
RoxygenNote: 6.0.1
Suggests: 
    knitr,
    rmarkdown,
    testthat,
    covr
VignetteBuilder: knitr

URL for the package (the development repository, not a stylized html page):

https://github.com/pachamaltese/oec-r-package

Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

databases, because the package connects to an API and does 3 or more API calls to simplify things for the final user who wants imports/exports and some metrics such as % of increase/decrease.

Who is the target audience and what are scientific applications of this package?

Non-expert users that use international trade data. This can also be targeted to intermediate/advanced users who can benefit from the speed and short syntax that this package provides.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Not at the moment (in 2 yrs this is the only one)

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

No.

Requirements

Confirm each of the following by checking the box. This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
[x] I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

[x] Do you intend for this package to go on CRAN?
[] Do you wish to automatically submit to the Journal of Open Source Software? If so:
- [ ] The package has an obvious research application according to JOSS's definition.
- [ ] The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- [ ] The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
- [ ] The package is novel and will be of interest to the broad readership of the journal.
- [ ] The manuscript describing the package is no longer than 3000 words.
- [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

[x] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

None.

If this is a resubmission following rejection, please explain the change in circumstances:

No.

If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

I don't really know.

maelle commented 6 years ago

@pachamaltese Thanks for your submission! From a quick look you checked "contains a vignette with examples of its essential functions and uses." and "has a test suite." but I do not see neither a vignette nor a test suite in the GitHub repo? We can put this submission on hold while you add them, and we can provide links about e.g. unit testing if necessary.

maelle commented 6 years ago

Also, could you please expand here on the added value of the data processing offered by your package as opposed to using the original data sources?

pachadotdev commented 6 years ago

@maelle thanks, no worries, let me check my laptop hdd vs my external drive, I probably forgot to move something

About the added value if I run getdata_batch("chl", "chn", 2010, 2015) I will obtain a tibble with 6 years of data for chilean exports to china but with these additions:

country codes corrected and written as ISO-3
full product names
full country names
trade balance
economic complexity measures and ranks
top importers and exporters
a color column to use the same palette as atlas.media.mit.edu

On the other hand, with API calls I would need to read https://atlas.media.mit.edu/hs92/export/2010/chl/chn/show/ https://atlas.media.mit.edu/hs92/export/2011/chl/chn/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/chl/chn/show/

and also to read this https://atlas.media.mit.edu/hs92/export/2010/chl/all/show/ https://atlas.media.mit.edu/hs92/export/2011/chl/all/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/chl/all/show/

and also this https://atlas.media.mit.edu/hs92/export/2010/all/all/show/ https://atlas.media.mit.edu/hs92/export/2011/all/all/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/all/all/show/

After reading those 15 json files the user would need some additional operations to see the same result as getdata_batch("chl", "chn", 2010, 2015):

a left_join to match API product code (e.g. 2709) to product name (e.g. petroleum), a mayor drawback here is that the code-name table can't be accessed by using the API
another left_join to match country codes (e.g. sachl) to country names (e.g. Chile) with the big problem that the API returns non-ISO codes (e.g. sachl vs chl), again the user would need a code-name table and would need to crop the strings
join everything and finally use purrr to create a unique tibble

pachadotdev commented 6 years ago

Hi @maelle. I did put all the pieces together: external disks, google drive, etc. I have pushed a new version with lots of changes as well as a testing unit. Have a good weekend!

maelle commented 6 years ago

Hi @pachamaltese! Thanks.

I've been traveling so my initial checks will take a bit longer than expected.

I am not sure I understood how your package got scattered over hard drives &co but please ask us any git/GitHub question you might have. See also http://happygitwithr.com ☺

pachadotdev commented 6 years ago

thanks a lot @maelle

happy travelling!!

I have +5 external hdd since my macbook fried itself and I lost tons of tidy useful data :( I'm still recovering from that accident ;)

maelle commented 6 years ago

Editor checks:

[x] Fit: The package meets criteria for fit and overlap
[x] Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
[x] License: The package has a CRAN or OSI accepted license
[x] Repository: The repository link resolves correctly
[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks again for your submission @pachamaltese!

Here are a some points that should be solved/discussed before I start looking for reviewers. Note that now that you have everything on GitHub, you shouldn't loose pieces again, which is good news. Sorry to hear about your Macbook! 😢

This process is an ongoing discussion so ask any question/make any remark you might have about my comments!

Code

You use assign in functions instead of returning output. Please return output, you don't need to use assign here.
Why do you use rm inside a function?
There is a separate file containing imports (confusingly called exports :wink:). It'd be easier to see where functions are imported from if you used the syntax jsonlite::fromJSON. You could then remove the separate R code with imports. Cf http://r-pkgs.had.co.nz/namespace.html#imports
Your package depends on purrr for a single function. Consider replacing it with do.call and rbind.
Regarding all the stopifnot, have informative error messages when the user entered a wrong parameter value.
You could have a single exported function instead of two if I follow correctly? It'd take arguments like the batch one, that could be vector of length 1 (the user would input say one or several country pairs). The other function would be an utility function, that wouldn't need to be exported. It might simplify the interface?
If there is only one function call in a pipeline it might be clearer to remove the pipe.

  origin_destination <- origin_destination %>%
    mutate(trade_exchange_val = 
!!sym("export_val") + !!sym("import_val"))

would become

  origin_destination <-  mutate(origin_destination,
    trade_exchange_val = !!sym("export_val") + !!sym("import_val"))

In case you might ignore it, and in case it could be useful, there's a countrycode package.

Data

Where do the data files in data come from? Why are they pre-prepared, are you sure they won't be updated?
You could keep their data processing in data-raw.

Documentation

Please add the URL to the data source.
Are you allowed to use the logo in your README?
A flowchart of what processing your package does might be useful to explain the added value of using the package and to explain what has been preprocessed or not (what's being downloaded vs what's in data/).
In the README when mentioning the vignette have a link to the pkgdown website.
Please add @return fields to the documentation of the functions.
A vignette with an actual use case, and visualizations, would be nice (or an extension of the existing vignette).
Please explain what the data is e.g. SITC or add an external link.
Because some chunks of the vignette are unevaluated without a pre-generated output, I can't understand what the "problem" is. You could pre-generate the output.

Misc

Could you rename the repo "oec" like the package?
The "Maintainer" field in DESCRIPTION is useless because it'll be generated automatically from Authors.
What are the files .Rapp.history in data/ and .build.timestamp in vignettes/?
Please add a code coverage integration and badge (via e.g. usethis::usethis::use_coverage())

goodpractice output

 ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
    often. A build date will be added to the package when you perform `R CMD build` on
    it.
  ✖ avoid long code lines, it is bad for readability. Also, many people
    prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R\getdata.R:56:1
    R\getdata_batch.R:44:1
    R\getdata_batch.R:68:1
    tests\testthat\test-oec.R:3:1
    tests\testthat\test-oec.R:15:1
    ... and 10 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
    1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on
    the right hand side is zero. Use seq_len() or seq_along() instead.

    R\getdata_batch.R:66:13

Reviewers: @aedobbyn @cimentadaj Due date: 2018-07-04

maelle commented 6 years ago

@pachamaltese :wave: any question or update? 😺

pachadotdev commented 6 years ago

@maelle not yet, I've been moving a lot. Can I send it tomorrow at night?

maelle commented 6 years ago

If you think you won't have enough time to work on the package in the next weeks, we can set this submission on hold until you have more time.

pachadotdev commented 6 years ago

tomorrow's night is ok, I have to complete the vignettes and that takes 1-2 hours :)

pachadotdev commented 6 years ago

@maelle it's ok now, I did all the steps from your feedback and just a few are explained why yes/no to it is better for this particular API.

maelle commented 6 years ago

Thanks, can you please answer with the explanations in this thread? 🙂

pachadotdev commented 6 years ago

Maelle' s initial feedback

Code

[x] You use assign in functions instead of returning output. Please return output, you don't need to use assign here.
[x] Why do you use rm inside a function? fixed, it was a bad try to make lighter function calls
[x] There is a separate file containing imports (confusingly called exports wink). It'd be easier to see where functions are imported from if you used the syntax jsonlite::fromJSON. You could then remove the separate R code with imports. http://r-pkgs.had.co.nz/namespace.html#imports
[x] Your package depends on purrr for a single function. Consider replacing it with do.call and rbind. To (over)simplify now I also use purrr's flatten df function
[x] Regarding all the stopifnot, have informative error messages when the user entered a wrong parameter value.
[x] You could have a single exported function instead of two if I follow correctly? It'd take arguments like the batch one, that could be vector of length 1 (the user would input say one or several country pairs). The other function would be an utility function, that wouldn't need to be exported. It might simplify the interface?
[x] If there is only one function call in a pipeline it might be clearer to remove the pipe.

origin_destination <- origin_destination %>%
    mutate(trade_exchange_val = 
!!sym("export_val") + !!sym("import_val"))

would become

origin_destination <-  mutate(origin_destination,
    trade_exchange_val = !!sym("export_val") + !!sym("import_val"))

[x] In case you might ignore it, and in case it could be useful, there's a countrycode package. the OEC uses an ISO modification, so I'm not so sure about it

Data

[x] Where do the data files in data come from? Why are they pre-prepared, are you sure they won't be updated? everything in data/ comes from atlas.media.mit.edu and corresponds to unaltered trade codes (i.e HS96 was created not to touch HS92 created in 1992, HS02 was an update to HS96, etc so these tables do not change
[x] You could keep their data processing in data-raw. ok, I'll add a vignette to explain the changes

Documentation

[x] Please add the URL to the data source.
[x] Are you allowed to use the logo in your README? yes, I' m a part of the original team
[x] A flowchart of what processing your package does might be useful to explain the added value of using the package and to explain what has been preprocessed or not (what's being downloaded vs what's in data/).
[x] In the README when mentioning the vignette have a link to the pkgdown website.
[x] Please add @return fields to the documentation of the functions.
[x] A vignette with an actual use case, and visualizations, would be nice (or an extension of the existing vignette).
[x] Please explain what the data is e.g. SITC or add an external link.
[x] Because some chunks of the vignette are unevaluated without a pre-generated output, I can't understand what the "problem" is. You could pre-generate the output.

Misc

[x] Could you rename the repo "oec" like the package? travis also renamed
[x] The "Maintainer" field in DESCRIPTION is useless because it'll be generated automatically from Authors.
[x] What are the files .Rapp.history in data/ and .build.timestamp in vignettes/? temp files from R CMD, ok now
[x] Please add a code coverage integration and badge (via e.g. usethis::usethis::use_coverage())

maelle commented 6 years ago

Thanks a lot @pachamaltese! 😺 I started looking and two small comments already:

The URL to the pkgdown site should be updated now that the repo has a new name.
You don't export the batch function any more but it's still present in the website docs at least cf http://pacha.hk/oec/reference/index.html

maelle commented 6 years ago

The link to the logo is also broken at the moment.

maelle commented 6 years ago

Thanks again @pachamaltese!

A few more points before I start looking for reviewers

The URL in the repo description should be updated.
The documentation website has to be updated (it still documents the function that was removed)
The logo doesn't appear any more in the README.
You can now remove my feedback from the README :wink: and have a more prominent link to the pkgdown website in it.
You can now add a peer-review badge to your README

[![](https://badges.ropensci.org/217_status.svg)](https://github.com/ropensci/onboarding/issues/217)

It'll indicate the package is under review, until after onboarding when it'll then indicate it was peer-reviewed.

Please add a code of conduct to your repo, e.g. via usethis::use_code_of_conduct()
Rbuildignore data-raw e.g. via usethis::use_build_ignore("data-raw")
The codecov badge has a wrong URL.
goodpractice output

It is good practice to

  ✖ write unit tests for all functions, and all package code in
    general. 89% of code lines are covered by test cases.

    R/getdata.R:44:NA
    R/getdata.R:47:NA
    R/getdata.R:50:NA
    R/getdata.R:55:NA
    R/getdata.R:59:NA
    ... and 13 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it gets
    invalid quite often. A build date will be added to the package when you
    perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also, many
    people prefer editor windows that are about 80 characters wide. Try make
    your lines shorter than 80 characters

    R\getdata.R:47:1
    R\getdata.R:50:1
    R\getdata.R:59:1
    R\getdata.R:64:1
    R\getdata.R:69:1
    ... and 5 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
    1:NCOL(...) expressions. They are error prone and result 1:0 if the
    expression on the right hand side is zero. Use seq_len() or seq_along()
    instead.

    R\getdata.R:111:32
    R\getdata.R:174:26
    R\getdata.R:200:25

I think that you could easily increase code coverage, looking at https://codecov.io/gh/pachamaltese/oec/src/master/R/getdata.R you should

test that the function returns errors when it should
test more use cases (e.g. when destination == "all")

Thanks for all your work!

pachadotdev commented 6 years ago

Great, thanks ! I'll do that before 14.00 And sorry I didn't notice I broke links when renaming

On Sun, Jun 10, 2018, 8:26 AM Maëlle Salmon notifications@github.com wrote:

Thanks again @pachamaltese https://github.com/pachamaltese!

A few more points before I start looking for reviewers

-

The URL in the repo description should be updated.

The documentation website has to be updated (it still documents the function that was removed)

The logo doesn't appear any more in the README.

You can now remove my feedback from the README 😉 and have a more prominent link to the pkgdown website in it.

You can now add a peer-review badge to your README

It'll indicate the package is under review, until after onboarding when it'll then indicate it was peer-reviewed.

-

Please add a code of conduct to your repo, e.g. via usethis::use_code_of_conduct()

Rbuildignore data-raw e.g. via usethis::use_build_ignore("data-raw")

The codecov badge has a wrong URL.

goodpractice output

It is good practice to

✖ write unit tests for all functions, and all package code in general. 89% of code lines are covered by test cases.
R/getdata.R:44:NA
R/getdata.R:47:NA
R/getdata.R:50:NA
R/getdata.R:55:NA
R/getdata.R:59:NA
... and 13 more lines
✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when you perform R CMD build on it. ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters
R\getdata.R:47:1
R\getdata.R:50:1
R\getdata.R:59:1
R\getdata.R:64:1
R\getdata.R:69:1
... and 5 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.
R\getdata.R:111:32
R\getdata.R:174:26
R\getdata.R:200:25
I think that you could easily increase code coverage, looking at https://codecov.io/gh/pachamaltese/oec/src/master/R/getdata.R you should

-

test that the function returns errors when it should

test more use cases (e.g. when destination == "all")

Thanks for all your work!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-396045176, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6Oa6i5RC86vR5agDIBsMCxfApsP9-ks5t7RB6gaJpZM4T-I-i .

maelle commented 6 years ago

You're welcome, no hurry and no problem!

maelle commented 6 years ago

Note that only one vignette appeared on the pkgdown website last time I looked

pachadotdev commented 6 years ago

@maelle ready, I used Jim Hester's lintr package for this 2nd round of feedback. Thanks a lot for the useful comments !!

maelle commented 6 years ago

Many thanks, I'll now start looking for reviewers!

maelle commented 6 years ago

Thanks @aedobbyn and @cimentadaj for accepting to review this package! 😸

Your reviews are due on 2018-07-04. As a reminder here are links to our bookdown in particular the packaging guidelines and reviewer's guide and to the review template.

aedobbyn commented 6 years ago

Thanks @maelle and good to meet you @pachamaltese! This package looks very useful (and I love the hex sticker!)

I want to make sure I've got the right version of the package. I installed the development version but running any of the getdata examples produces the same error in what looks to be jsonlite:::parse_con.

Let me know if this is fixed in another branch and I'll re-install from there. My session info is in the reprex below.

library(oec)

getdata("chl", "chn", 2015)
#> Valid input. Processing 2015 data...
#> Error in parse_con(txt, bigint_as_char): parse error: premature EOF
#>                                        
#>                      (right here) ------^

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Chicago             
#>  date     2018-06-18
#> Packages -----------------------------------------------------------------
#>  package    * version    date      
#>  assertthat   0.2.0      2017-04-11
#>  backports    1.1.2      2017-12-13
#>  base       * 3.4.4      2018-03-15
#>  bindr        0.1.1      2018-03-13
#>  bindrcpp     0.2.2      2018-03-29
#>  compiler     3.4.4      2018-03-15
#>  curl         3.2        2018-03-28
#>  datasets   * 3.4.4      2018-03-15
#>  devtools     1.13.5     2018-02-18
#>  digest       0.6.15     2018-01-28
#>  dplyr        0.7.5      2018-05-19
#>  evaluate     0.10.1     2017-06-24
#>  glue         1.2.0.9000 2018-06-18
#>  graphics   * 3.4.4      2018-03-15
#>  grDevices  * 3.4.4      2018-03-15
#>  htmltools    0.3.6      2017-04-28
#>  jsonlite     1.5        2017-06-01
#>  knitr        1.20       2018-02-20
#>  magrittr     1.5        2014-11-22
#>  memoise      1.1.0      2017-04-21
#>  methods    * 3.4.4      2018-03-15
#>  oec        * 2.8.1      2018-06-18
#>  pillar       1.2.1      2018-02-27
#>  pkgconfig    2.0.1      2017-03-21
#>  purrr        0.2.5      2018-05-29
#>  R6           2.2.2      2017-06-17
#>  Rcpp         0.12.17    2018-05-18
#>  rlang        0.2.1      2018-05-30
#>  rmarkdown    1.9        2018-03-01
#>  rprojroot    1.3-2      2018-01-03
#>  stats      * 3.4.4      2018-03-15
#>  stringi      1.2.2      2018-05-02
#>  stringr      1.3.1      2018-05-10
#>  tibble       1.4.2      2018-01-22
#>  tidyselect   0.2.4      2018-02-26
#>  tools        3.4.4      2018-03-15
#>  utils      * 3.4.4      2018-03-15
#>  withr        2.1.2      2018-03-15
#>  yaml         2.1.18     2018-03-08
#>  source                                     
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  cran (@0.7.5)                              
#>  CRAN (R 3.4.1)                             
#>  Github (tidyverse/glue@a2c0f8b)            
#>  local                                      
#>  local                                      
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.0)                             
#>  local                                      
#>  Github (pachamaltese/oec-r-package@00a367b)
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.0)                             
#>  cran (@0.2.5)                              
#>  CRAN (R 3.4.0)                             
#>  cran (@0.12.17)                            
#>  cran (@0.2.1)                              
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  cran (@1.2.2)                              
#>  cran (@1.3.1)                              
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)

pachadotdev commented 6 years ago

@aedobbyn Hi. Nice to meet u too! Thanks for letting me know. I installed R on another machine to try to reproduce this and ... it looks as the OEC has a problem

I should explore this, the problem can me an MIT server problem at the moment, but this is problem as the package just verifies internet connection before trying to get data and does not inspect additional things.

For instance, if I visit https://atlas.media.mit.edu/sitc/export/2010/chl/arg/show/ I get this screenshot from 2018-06-18 13-44-24

pachadotdev commented 6 years ago

@aedobbyn update: somebody within the OEC team screwed something and not all origins/destinations are available at the moment... the US data is working ok but no cigar trying to compute RCAs that need world flows :(

I shall update when this is fixed

screenshot from 2018-06-18 13-50-42

aedobbyn commented 6 years ago

Great, thanks for the quick update! Good opportunity to provide informative error messages to the user when the server is down or, in the case of the URL above, nothing is returned at all (I don't even get a 500 in the Chrome console). I'll look out for another update from you!

maelle commented 6 years ago

To test (in unit tests) how the package react to http errors you can use @sckott's webmockr package https://github.com/ropensci/webmockr

pachadotdev commented 6 years ago

@maelle: Thanks a again !! at this point you have contributed more to this package than the actual listed ctbs @aedobbyn: Thanks, now it's fixed. This was a server side problem that I should include as an error message. I tested both R and browser access to the API and it's working again :) screenshot from 2018-06-19 09-08-05

aedobbyn commented 6 years ago

Hmm I still get the same parse error: premature EOF even though that endpoint in the browser now displays data. @cimentadaj are you able to receive data?

cimentadaj commented 6 years ago

Hi @maelle, @aedobbyn and @pachamaltese ! Glad to review this package. I'll submit my review this weekend, but in the mean time, the development version of the package (devtools::install_github("pachamaltese/oec-r-package")) seems to work for me:

library(oec)
getdata("chl", "chn", 2015)
#> Valid input. Processing 2015 data...
#> # A tibble: 670 x 31
#>     year origin_id destination_id origin_name destination_name id   
#>    <dbl> <chr>     <chr>          <chr>       <chr>            <chr>
#>  1  2015 chl       chn            Chile       China            5722 
#>  2  2015 chl       chn            Chile       China            5838 
#>  3  2015 chl       chn            Chile       China            6251 
#>  4  2015 chl       chn            Chile       China            6282 
#>  5  2015 chl       chn            Chile       China            6289 
#>  6  2015 chl       chn            Chile       China            6352 
#>  7  2015 chl       chn            Chile       China            6418 
#>  8  2015 chl       chn            Chile       China            6572 
#>  9  2015 chl       chn            Chile       China            6577 
#> 10  2015 chl       chn            Chile       China            6631 
#> # ... with 660 more rows, and 25 more variables: id_len <int>,
#> #   product_name <chr>, group_id <chr>, group_name <chr>,
#> #   export_val <dbl>, export_val_growth_pct <dbl>,
#> #   export_val_growth_pct_5 <dbl>, export_val_growth_val <dbl>,
#> #   export_val_growth_val_5 <dbl>, export_rca <dbl>, import_val <dbl>,
#> #   import_val_growth_pct <dbl>, import_val_growth_pct_5 <dbl>,
#> #   import_val_growth_val <dbl>, import_val_growth_val_5 <dbl>,
#> #   import_rca <dbl>, trade_exchange_val <dbl>, pci <dbl>, pci_rank <dbl>,
#> #   pci_rank_delta <dbl>, top_exporter_code <chr>,
#> #   top_importer_code <chr>, top_importer <chr>, top_exporter <chr>,
#> #   color <chr>

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  tz       Europe/Paris                
#>  date     2018-06-20
#> Packages -----------------------------------------------------------------
#>  package    * version    date      
#>  assertthat   0.2.0      2017-04-11
#>  backports    1.1.2      2017-12-13
#>  base       * 3.4.4      2018-03-15
#>  bindr        0.1.1      2018-03-13
#>  bindrcpp   * 0.2.2      2018-03-29
#>  cli          1.0.0      2017-11-05
#>  compiler     3.4.4      2018-03-15
#>  crayon       1.3.4      2017-09-16
#>  curl         3.2        2018-03-28
#>  datasets   * 3.4.4      2018-03-15
#>  devtools     1.13.5     2018-02-18
#>  digest       0.6.15     2018-01-28
#>  dplyr        0.7.5      2018-05-19
#>  evaluate     0.10.1     2017-06-24
#>  formatR      1.5        2017-04-25
#>  glue         1.2.0.9000 2018-06-19
#>  graphics   * 3.4.4      2018-03-15
#>  grDevices  * 3.4.4      2018-03-15
#>  htmltools    0.3.6      2017-04-28
#>  jsonlite     1.5        2017-06-01
#>  knitr        1.20       2018-02-20
#>  magrittr     1.5        2014-11-22
#>  memoise      1.1.0      2018-03-20
#>  methods    * 3.4.4      2018-03-15
#>  oec        * 2.8.1      2018-06-20
#>  pillar       1.2.2      2018-04-26
#>  pkgconfig    2.0.1      2017-03-21
#>  purrr        0.2.5      2018-05-29
#>  R6           2.2.2      2017-06-17
#>  Rcpp         0.12.17    2018-05-18
#>  rlang        0.2.0.9001 2018-06-20
#>  rmarkdown    1.9        2018-03-01
#>  rprojroot    1.3-2      2018-01-03
#>  stats      * 3.4.4      2018-03-15
#>  stringi      1.1.7      2018-03-12
#>  stringr      1.3.1      2018-05-10
#>  tibble       1.4.2      2018-01-22
#>  tidyselect   0.2.4      2018-02-26
#>  tools        3.4.4      2018-03-15
#>  utf8         1.1.3      2018-01-03
#>  utils      * 3.4.4      2018-03-15
#>  withr        2.1.2      2018-03-18
#>  yaml         2.1.18     2018-03-08
#>  source                                     
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  Github (tidyverse/glue@a2c0f8b)            
#>  local                                      
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  Github (hadley/memoise@06d16ec)            
#>  local                                      
#>  Github (pachamaltese/oec-r-package@00a367b)
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  Github (r-lib/rlang@ba4fb06)               
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  Github (r-lib/withr@79d7b0d)               
#>  CRAN (R 3.4.4)

pachadotdev commented 6 years ago

Hi @maelle, @aedobbyn and @cimentadaj I added some error messages to inform the user when the wifi drops or MIT server is down/with problems. Last night I had the same problem (server side) and the browser returned an empty json, then 30 mins later I could obtain the data. I'll tell Media Lab people about this, but in the meanwhile an error message "patches" server side issues :) And yes, devtools::install_github("pachamaltese/oec") install the current version of the package that you are reviewing.

aedobbyn commented 6 years ago

Great, working for me too now!

Just so I know the intent @pachamaltese, you quote variables from the OEC and use sym and !! to unquote them to avoid the NOTE no visible binding for global variable {variable} correct?

A version with the tidyeval stripped out passes all tests without warnings so I imagine the NOTE is the only reason.

pachadotdev commented 6 years ago

@aedobbyn exactly!! nobody ever before has discovered the main reason why I do that :D

aedobbyn commented 6 years ago

Totally understand, those NOTEs are annoying :relaxed:

As you probably know, those NOTEs shouldn't stop you from passing a CRAN check, especially if you document them in an . Rbuildignored cran-comments.md. Might be cleaner to do that and avoid tidyeval altogether, but I would understand if you want to leave the tidyeval in.

aedobbyn commented 6 years ago

@maelle makes the excellent point that you can let document these global variables in a globalVariables.R file like this to avoid the no visible binding for global variable NOTEs. (Also see, e.g., an example of it in use in her opencage package.)

If you devtools::check() the oec code in that amanda branch on my fork you won't get any of those NOTEs and you won't need to use the sym + !! syntax -- let me know if you want me to shoot you a PR with the code from that fork :)

pachadotdev commented 6 years ago

@aedobbyn thanks! I accept that if and only if you agree to be added as ctb because that is a contribution, a good contribution to be precise :)

aedobbyn commented 6 years ago

Haha that's very generous! Its' @maelle who deserves the credit, as usual 😆

cimentadaj commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software

[ ] Authors: A list of authors with their affiliations

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.

[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 1.3

Review Comments

The oec package is well written and definitely fulfills the objective of downloading trade date easily into R. I have comments which serve to improve the code, the user experience and the readibility of the code. Some are more immediate then others so feel free to ignore the small ones or new features if time is of essence. However, I would consider them for future releases. They are not in any particular order of importance.

Great opportunity to use a switch statement here.
I would use match.args here
I'm not sure whether curl retries calling the API if it doesn't respond, but if it doesn't, then you could retry 3-5 times like here. This would be very handy because you call the API several times in other links. Considering that the MIT server has been a bit unstable in the last few days then this could come in handy. I think there's a few packages to do this properly but if you want to avoid adding more dependencies, a function hack is this one using httr:

get_resp <- function(url, attempts_left = 5) {

  stopifnot(attempts_left > 0)

  resp <- httr::GET(url)

  # On a successful GET, return the response
  if (httr::status_code(resp) == 200) {
    resp
  } else if (attempts_left == 1) { # When attempts run out, stop with an error
    stop("Cannot connect to the MIT API")
  } else { # Otherwise, sleep a second and try again
    Sys.sleep(1)
    get_resp(url, attempts_left - 1)
  }
}

Feel free to adapt it to using fromJSON and remove the httr code.

Maybe I'm a bit ignorant but in this specific case, since you already checked that the user has internet at the beginning of the function, shouldn't this be only an MIT problem? Perhaps this error check can be wrapped into a function because it's used 2-3 times.
I see that you create several functions called read_from_api_** to create the URL of the API. It would be cleaner and safer to create a single function that is modular to all API urls and you provide different arguments depending on the specific url.
Any particular reason why you have get_data and getdata? Seems a bit redundant and not a clear grammatical reason to have two (for example, color and colour in ggplot2). In any case, because they share exactly the same documentation I would suggest documenting them together into a single page using #' @rdname get_data. See http://r-pkgs.had.co.nz/man.html. In fact, because they're the same exact function, if you want to keep both, then I would do something like this:

get_data <- function(all the arguments) {
 all code here....
}

getdata <- function(same arguments) {
get_data(pass the arguments)
}

To avoid repetition.

This is clearly more work, but I think it might be worth it for new users. Instead of relying on your user to search for a country in the 264 country names to look for a code, you could create a function that regexps the name of the country and returns the country code. For example, get_countrycode("Chile") or get_countrycode("chile") should yield the same. This can be piped into get_data more intuitively like get_data(get_countrycode("Chile"), get_countrycode("China"), 2010, 2015). Alternatively, the user can search the country tibble to search manually. But note that non-frequent R users will probably search manually and not with regexps.
On the same line, you could do something similar for all product tibbles. I would take a similar approach to the way the World Bank R package does it, wbstats. For example

library(wbstats)

unemploy_vars <- wbsearch(pattern = "unemployment")
head(unemploy_vars)

Check https://cran.r-project.org/web/packages/wbstats/README.html for more examples.

You could have a generic get_product that searches in all product tibbles and returns an aggregated tibble with all matches. For example if we search for Antiques then we get the same result from oec::hs02 but also a new column with indicating that it comes from hs02. This way when you have similar matches then the user can disambiguate where it comes from.

Of course this is a lot of work, but here's a very quick and crude approach:

get_products <- function(name) {
  # Grab all the names of all hs datasets
  all_datastr <-
    stringr::str_subset(
      data(package = "oec")$results[, "Item"],
      "^hs"
    )

  # get the datasets, create the type_product column, bind them all together
  # and do the search
  all_datastr %>% 
    purrr::map(get) %>% 
    purrr::map2_dfr(all_datastr, ~ {.x$type_product <- .y; .x}) %>% 
    dplyr::filter(stringr::str_detect(product_name, name))
}

get_products("Gold")
get_products("Animals")

Misc:

In the oec website and Github repo the installation code chunk still has the old repo (devtools::install_github("pachamaltese/oec-r-package"). In fact, try doing an overall repo search and update all URL's. The DESCRIPTION file needs to be updated in the BugReports field as well.

I like that you're very defensive in what you accept in the function but try to help the user in pointing out where something went wront. For example, here I will paste what the user wrote and when appropriate (not here because there's too many countries), list the available options the user could've chosen.

Great package!

pachadotdev commented 6 years ago

Whoa. Thanks a lot. This helps to improve the package. "Get_data" was provided by the other reviewers as an alternative without tidy evaluation. I'm from cellphone now so I'm eager to implement changes later!

El sáb., jun. 23, 2018 5:59 PM, Jorge Cimentada notifications@github.com escribió:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x ] A statement of need clearly stating problems the software is designed to solve and its target audience in README

[x ] Installation instructions: for the development version of package and any non-standard dependencies in README

Vignette(s) demonstrating major functionality that runs successfully locally

Function Documentation: for all exported functions in R help

Examples for all exported functions in R Help that run successfully locally

Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition http://joss.theoj.org/about#submission_requirements

The package contains a paper.md matching JOSS's requirements http://joss.theoj.org/about#paper_structure with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.

Functionality: Any functional claims of the software been confirmed.

Performance: Any performance claims of the software been confirmed.

Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 1.5 Review Comments

The oec package is well written and definitely fulfills the objective of downloading trade date easily into R. I have comments which serve to improve the code, the user experience and the readibility of the code. Some are more immediate then others so feel free to ignore the small ones or new features if time is of essence. However, I would consider them for future releases. They are not in any particular order of importance.

-

Great opportunity to use a switch statement here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L112-L130 .

I would use match.args here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L62-L64

I'm not sure whether curl retries calling the API if it doesn't respond, but if it doesn't, then you could retry 3-5 times like here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L142-L144. This would be very handy because you call the API several times in other links. Considering that the MIT server has been a bit unstable in the last few days then this could come in handy. I think there's a few packages to do this properly but if you want to avoid adding more dependencies, a function hack is this one using httr:

get_resp <- function(url, attempts_left = 5) {

stopifnot(attempts_left > 0)

resp <- httr::GET(url)

On a successful GET, return the response

if (httr::status_code(resp) == 200) { resp } else if (attempts_left == 1) { # When attempts run out, stop with an error stop("Cannot connect to the MIT API") } else { # Otherwise, sleep a second and try again Sys.sleep(1) get_resp(url, attempts_left - 1) } }

Feel free to adapt it to using fromJSON and remove the httr code.

-

Maybe I'm a bit ignorant but in this specific case https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L146-L149, since you already checked that the user has internet at the beginning of the function, shouldn't this be only an MIT problem? Perhaps this error check can be wrapped into a function because it's used 2-3 times.

I see that you create several functions called read_fromapi** to create the URL of the API. It would be cleaner and safer to create a single function that is modular to all API urls and you provide different arguments depending on the specific url.

Any particular reason why you have get_data and getdata? Seems a bit redundant and not a clear grammatical reason to have two (for example, color and colour in ggplot2). In any case, because they share exactly the same documentation I would suggest documenting them together into a single page using #' @rdname get_data. See http://r-pkgs.had.co.nz/man.html. In fact, because they're the same exact function, if you want to keep both, then I would do something like this:

get_data <- function(all the arguments) { all code here.... } getdata <- function(same arguments) { get_data(pass the arguments) }

To avoid repetition.

-

This is clearly more work, but I think it might be worth it for new users. Instead of relying on your user to search for a country in the 264 country names to look for a code, you could create a function that regexps the name of the country and returns the country code. For example, get_countrycode("Chile") or get_countrycode("chile") should yield the same. This can be piped into get_data more intuitively like get_data(get_countrycode("Chile"), get_countrycode("China"), 2010, 2015). Alternatively, the user can search the country tibble to search manually. But note that non-frequent R users will probably search manually and not with regexps.

On the same line, you could do something similar for all product tibbles. I would take a similar approach to the way the World Bank R package does it, wbstats. For example

library(wbstats) unemploy_vars <- wbsearch(pattern = "unemployment") head(unemploy_vars)

Check https://cran.r-project.org/web/packages/wbstats/README.html for more examples.

You could have a generic get_product that searches in all product tibbles and returns an aggregated tibble with all matches. For example if we search for Antiques then we get the same result from oec::hs02 but also a new column with indicating that it comes from hs02. This way when you have similar matches then the user can disambiguate where it comes from.

Of course this is a lot of work, but here's a very quick and crude approach:

get_products <- function(name) {

Grab all the names of all hs datasets

all_datastr <- stringr::str_subset( data(package = "oec")$results[, "Item"], "^hs" )

get the datasets, create the type_product column, bind them all together

and do the search

all_datastr %>% purrr::map(get) %>% purrr::map2_dfr(all_datastr, ~ {.x$type_product <- .y; .x}) %>% dplyr::filter(stringr::str_detect(product_name, name)) }

get_products("Gold") get_products("Animals")

Misc:

In the oec website and Github repo the installation code chunk still has the old repo (devtools::install_github("pachamaltese/oec-r-package"). In fact, try doing an overall repo search and update all URL's. The DESCRIPTION file needs to be updated in the BugReports field as well.

I like that you're very defensive in what you accept in the function but try to help the user in pointing out where something went wront. For example, here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L47-L51 I will paste what the user wrote and when appropriate (not here because there's too many countries), list the available options the user could've chosen.

Great package!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-399710318, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OedP8zut_pby9oJ6ToxzUKfxXoTEks5t_qwigaJpZM4T-I-i .

pachadotdev commented 6 years ago

@cimentadaj thanks a lot!! I have limited laptop access at the moment but I saw this and there are really good suggestions. Can I add you as a contributor? I never thought about matching arguments.

aedobbyn commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software

[ ] Authors: A list of authors with their affiliations

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.

[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

Review Comments

First off, I'm really excited for this package and the data it's going to make available to people in tidy format.

I think the modifications you've put in place so far and the ones suggested by @cimentadaj are all very useful. (Many of those I was going to suggest myself, but I won't repeat Jorge!) +1 especially to trying the server multiple times (you might even implement exponential backoff) and making things a bit more concise with match.args and switch.

A few other notes:

An alternative to @cimentadaj's suggestion to allow users to get_countrycode using a regex match would be to allow users to supply either the country_code or the country in get_data; so get_data("chl", "chn", 2015, "sitc") would return the same data as get_data("Chile", "China", 2015, "sitc"). While this approach gives the user less leeway in the input they can supply than a regex would, I think most of the countries in country_codes$country are represented the way most people would expect, which saves some work on your end.
It looks like you allow the user to supply "all" as origin and destination, but this isn't reflected in the documentation. Beyond simply mentioning that this is an option, it may even be worth setting "all" as the default value for origin and destination so that the user can get up and running with the function without looking through country codes.

I'll note that supplying "all" for both origin and destination does fail at the moment because during the join trade flows step you end up with top_importer.x, top_importer.y, etc. so operations expecting a single top_importer error out. So this case will need a bit of special treatment.

library(oec)
getdata("all", "all", 2015)
#> Valid input. Processing 2015 data...
#> Error in .f(.x[[i]], ...): object 'top_importer' not found

The docs say that classification defaults to "sitc", but that doesn't seem to be the case at the moment. I'm all in favor of more defaults for the function, so users can get up and running more quickly!
There are some quick wins to increase test coverage that would make the package more stable in the long term if the API data availability changes. I'd add tests for all the different classification types; that way if something happens to one of them down the road, your test will fail and Travis will alert you. Currently there is only a test for classification = "sitc".
I'd also suggest using webmockr to in the tests as well, as @maelle mentioned
The styling generally looks good but I'd recommend a [styler](https://github.com/r-lib/styler)::style_pkg() just to catch the remaining small things.
I would consider changing oec_r_package.Rproj to oec.Rproj for consistency (and I second @cimentadaj's point that it's good to do an entire sweep of the package to make sure that there are no more instances of oec_r_package or oec-r-package left).
I believe only the .Rmd needs to go in the vignettes directory. I like that you show in a vignette how you went about tidying the data that the user finds in the package.
I would add one or two examples of get_data to the readme or the pkgdown site so that people don't have to find the vignette or go to the docs to get a preview of the package. (Makes it more likely for people to download the package instead of clicking away, imo).
You may already be in the process of doing the full switchover from getdata to get_data but just to make sure -- anywhere where getdata was previously used e.g. in the vignettes, examples etc. should be replaced with get_data and of course the getdata function and its test file can be deleted.
You might consider documenting what each of these fields returned from the API mean. If there isn't a good link to point people to in the @return, you could include a tibble in the package that the user can access with something like variable_definitions.

I think that's it from me! Looking forward to seeing this package get some good use!

pachadotdev commented 6 years ago

Whoa! Many thanks. I'm still from cellphone but THIS "it may even be worth setting "all" as the default value for origin and destination" is where I'm starting. Thanks a lot!

On Sun, Jun 24, 2018, 8:33 PM Amanda Dobbyn notifications@github.com wrote:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README

Installation instructions: for the development version of package and any non-standard dependencies in README

Vignette(s) demonstrating major functionality that runs successfully locally

Function Documentation: for all exported functions in R help

Examples for all exported functions in R Help that run successfully locally

Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition http://joss.theoj.org/about#submission_requirements

The package contains a paper.md matching JOSS's requirements http://joss.theoj.org/about#paper_structure with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.

Functionality: Any functional claims of the software been confirmed.

Performance: Any performance claims of the software been confirmed.

Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

2 Review Comments

First off, I'm really excited for this package and the data it's going to make available to people in tidy format.

I think the modifications you've put in place so far and the ones suggested by @cimentadaj https://github.com/cimentadaj are all very useful. (Many of those I was going to suggest myself, but I won't repeat Jorge!) +1 especially to trying the server multiple times (you might even implement exponential backoff) and making things a bit more concise with match.args and switch.

A few other notes:

-

An alternative to @cimentadaj https://github.com/cimentadaj's suggestion to allow users to get_countrycode using a regex match would be to allow users to supply either the country_code or the country in get_data; so get_data("chl", "chn", 2015, "sitc") would return the same data as get_data("Chile", "China", 2015, "sitc"). While this approach gives the user less leeway in the input they can supply than a regex would, I think most of the countries in country_codes$country are represented the way most people would expect, which saves some work on your end.

It looks like you allow the user to supply "all" as origin and destination https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L179, but this isn't reflected in the documentation. Beyond simply mentioning that this is an option, it may even be worth setting "all" as the default value for origin and destination so that the user can get up and running with the function without looking through country codes.

I'll note that supplying "all" for both origin and destination does fail at the moment because during the join trade flows step https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L278 you end up with top_importer.x, top_importer.y, etc. so operations expecting a single top_importer error out. So this case will need a bit of special treatment.

library(oec) getdata("all", "all", 2015)

> Valid input. Processing 2015 data...

> Error in .f(.x[[i]], ...): object 'top_importer' not found

-

The docs say https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L10 that classification defaults to "sitc", but that doesn't seem to be the case https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L41 at the moment. I'm all in favor of more defaults for the function, so users can get up and running more quickly!

There are some quick wins to increase test coverage https://codecov.io/gh/pachamaltese/oec/src/6da195837d28fb9638f89e466fd9a33816dca6f8/R/get_data.R that would make the package more stable in the long term if the API data availability changes. I'd add tests for all the different classification types; that way if something happens to one of them down the road, your test will fail and Travis will alert you. Currently there is only a test for classification = "sitc".

I'd also suggest using webmockr to in the tests as well, as @maelle mentioned https://github.com/ropensci/onboarding/issues/217#issuecomment-398150814

The styling generally looks good but I'd recommend a styler::style_pkg() just to catch the remaining small things.

I would consider changing oec_r_package.Rproj to oec.Rproj for consistency (and I second @cimentadaj https://github.com/cimentadaj's point that it's good to do an entire sweep of the package to make sure that there are no more instances of oec_r_package or oec-r-package left).

I believe only the .Rmd needs to go in the vignettes directory. I like that you show in a vignette how you went about tidying the data that the user finds in the package.

I would add one or two examples of get_data to the readme or the pkgdown site so that people don't have to find the vignette or go to the docs to get a preview of the package. (Makes it more likely for people to download the package instead of clicking away, imo).

You may already be in the process of doing the full switchover from getdata to get_data but just to make sure -- anywhere where getdata was previously used e.g. in the vignettes, examples etc. should be replaced with get_data and of course the getdata function and its test file can be deleted.

You might consider documenting what each of these fields returned from the API mean. If there isn't a good link to point people to in the @return, you could include a tibble in the package that the user can access with something like variable_definitions.

I think that's it from me! Looking forward to seeing this package get some good use!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-399801749, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OQ64kkf5lWbay5-TR1i1FX2nIdD0ks5uAC_fgaJpZM4T-I-i .

aedobbyn commented 6 years ago

Awesome, looking forward to seeing the v2!

maelle commented 6 years ago

Many thanks for your reviews @aedobbyn and @cimentadaj, they seem very thorough and useful! 😺

I learnt about the global variables "trick" myself here 😉

cimentadaj commented 6 years ago

+1 for @aedobbyn and @maelle comments! Good job @pachamaltese, looking forward to the next version! Yes, of course, feel free to include me as ctb.

maelle commented 6 years ago

By the way you can add reviewers as ctb but also as rev cf https://ropensci.github.io/dev_guide/building.html#authorship

pachadotdev commented 6 years ago

thanks a lot @maelle, may I submit a new version next week?

maelle commented 6 years ago

Of course! As per our guide for authors "we ask that you respond to reviewers’ comments within 2 weeks of the last-submitted review, but you may make updates to your package or respond at any time.". 😺

pachadotdev commented 6 years ago

thanks!!! this week I'm useless, just reproducible reports at work, but next sat I'll retake this :) have a great week