ropensci / software-review

rOpenSci Software Peer Review.
290 stars 104 forks source link

oec package #217

Closed pachadotdev closed 5 years ago

pachadotdev commented 6 years ago

Summary

oec provides an easy way to obtain data from the Observatory of Economic Complexity by accessing its API.

Package: oec
Type: Package
Title: Observatory of Economic Complexity API Wrapper and Utility Program
Version: 2.8.2
Date: 2018-06-09
Authors@R: c(
  person("Mauricio", "Vargas S.", role = c("aut", "cre", "cph"), email = "oec@media.mit.edu"),
  person("Manuel", "Aristaran", role = c("ctb")),
  person("Pablo", "Paladino", role = c("ctb")),
  person("Gabriela", "Perez", role = c("ctb")),
  person(family = "UN Comtrade", role = c("dtc")),
  person(family = "MIT Media Lab", role = c("dtc")),
  person(family = "Datawheel", role = c("fnd"))
  )
URL: https://CRAN.R-project.org/package=oec
BugReports: https://github.com/pachamaltese/oec-r/issues
Description: Access The Observatory of Economic Complexity API from R to download international trade data.
License: MIT + file LICENSE
LazyData: TRUE
Depends:
  R (>= 3.2)
Imports:
  rlang,
  magrittr,
  dplyr,
  stringr,
  curl,
  purrr,
  jsonlite
RoxygenNote: 6.0.1
Suggests: 
    knitr,
    rmarkdown,
    testthat,
    covr
VignetteBuilder: knitr

https://github.com/pachamaltese/oec-r-package

databases, because the package connects to an API and does 3 or more API calls to simplify things for the final user who wants imports/exports and some metrics such as % of increase/decrease.

Non-expert users that use international trade data. This can also be targeted to intermediate/advanced users who can benefit from the speed and short syntax that this package provides.

Not at the moment (in 2 yrs this is the only one)

No.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

None.

No.

I don't really know.

maelle commented 6 years ago

@pachamaltese Thanks for your submission! From a quick look you checked "contains a vignette with examples of its essential functions and uses." and "has a test suite." but I do not see neither a vignette nor a test suite in the GitHub repo? We can put this submission on hold while you add them, and we can provide links about e.g. unit testing if necessary.

maelle commented 6 years ago

Also, could you please expand here on the added value of the data processing offered by your package as opposed to using the original data sources?

pachadotdev commented 6 years ago

@maelle thanks, no worries, let me check my laptop hdd vs my external drive, I probably forgot to move something

About the added value if I run getdata_batch("chl", "chn", 2010, 2015) I will obtain a tibble with 6 years of data for chilean exports to china but with these additions:

On the other hand, with API calls I would need to read https://atlas.media.mit.edu/hs92/export/2010/chl/chn/show/ https://atlas.media.mit.edu/hs92/export/2011/chl/chn/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/chl/chn/show/

and also to read this https://atlas.media.mit.edu/hs92/export/2010/chl/all/show/ https://atlas.media.mit.edu/hs92/export/2011/chl/all/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/chl/all/show/

and also this https://atlas.media.mit.edu/hs92/export/2010/all/all/show/ https://atlas.media.mit.edu/hs92/export/2011/all/all/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/all/all/show/

After reading those 15 json files the user would need some additional operations to see the same result as getdata_batch("chl", "chn", 2010, 2015):

pachadotdev commented 6 years ago

Hi @maelle. I did put all the pieces together: external disks, google drive, etc. I have pushed a new version with lots of changes as well as a testing unit. Have a good weekend!

maelle commented 6 years ago

Hi @pachamaltese! Thanks.

I've been traveling so my initial checks will take a bit longer than expected.

I am not sure I understood how your package got scattered over hard drives &co but please ask us any git/GitHub question you might have. See also http://happygitwithr.com

pachadotdev commented 6 years ago

thanks a lot @maelle

happy travelling!!

I have +5 external hdd since my macbook fried itself and I lost tons of tidy useful data :( I'm still recovering from that accident ;)

maelle commented 6 years ago

Editor checks:


Editor comments

Thanks again for your submission @pachamaltese!

Here are a some points that should be solved/discussed before I start looking for reviewers. Note that now that you have everything on GitHub, you shouldn't loose pieces again, which is good news. Sorry to hear about your Macbook! 😢

This process is an ongoing discussion so ask any question/make any remark you might have about my comments!

Code

  origin_destination <- origin_destination %>%
    mutate(trade_exchange_val = 
!!sym("export_val") + !!sym("import_val"))

would become

  origin_destination <-  mutate(origin_destination,
    trade_exchange_val = !!sym("export_val") + !!sym("import_val"))

Data

Documentation

Misc

goodpractice output

 ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
    often. A build date will be added to the package when you perform `R CMD build` on
    it.
  ✖ avoid long code lines, it is bad for readability. Also, many people
    prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R\getdata.R:56:1
    R\getdata_batch.R:44:1
    R\getdata_batch.R:68:1
    tests\testthat\test-oec.R:3:1
    tests\testthat\test-oec.R:15:1
    ... and 10 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
    1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on
    the right hand side is zero. Use seq_len() or seq_along() instead.

    R\getdata_batch.R:66:13

Reviewers: @aedobbyn @cimentadaj Due date: 2018-07-04

maelle commented 6 years ago

@pachamaltese :wave: any question or update? 😺

pachadotdev commented 6 years ago

@maelle not yet, I've been moving a lot. Can I send it tomorrow at night?

maelle commented 6 years ago

If you think you won't have enough time to work on the package in the next weeks, we can set this submission on hold until you have more time.

pachadotdev commented 6 years ago

tomorrow's night is ok, I have to complete the vignettes and that takes 1-2 hours :)

pachadotdev commented 6 years ago

@maelle it's ok now, I did all the steps from your feedback and just a few are explained why yes/no to it is better for this particular API.

maelle commented 6 years ago

Thanks, can you please answer with the explanations in this thread? 🙂

pachadotdev commented 6 years ago

Maelle' s initial feedback

Code

origin_destination <- origin_destination %>%
    mutate(trade_exchange_val = 
!!sym("export_val") + !!sym("import_val"))

would become

origin_destination <-  mutate(origin_destination,
    trade_exchange_val = !!sym("export_val") + !!sym("import_val"))

Data

Documentation

Misc

maelle commented 6 years ago

Thanks a lot @pachamaltese! 😺 I started looking and two small comments already:

maelle commented 6 years ago

The link to the logo is also broken at the moment.

maelle commented 6 years ago

Thanks again @pachamaltese!

A few more points before I start looking for reviewers

[![](https://badges.ropensci.org/217_status.svg)](https://github.com/ropensci/onboarding/issues/217)

It'll indicate the package is under review, until after onboarding when it'll then indicate it was peer-reviewed.

It is good practice to

  ✖ write unit tests for all functions, and all package code in
    general. 89% of code lines are covered by test cases.

    R/getdata.R:44:NA
    R/getdata.R:47:NA
    R/getdata.R:50:NA
    R/getdata.R:55:NA
    R/getdata.R:59:NA
    ... and 13 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it gets
    invalid quite often. A build date will be added to the package when you
    perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also, many
    people prefer editor windows that are about 80 characters wide. Try make
    your lines shorter than 80 characters

    R\getdata.R:47:1
    R\getdata.R:50:1
    R\getdata.R:59:1
    R\getdata.R:64:1
    R\getdata.R:69:1
    ... and 5 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
    1:NCOL(...) expressions. They are error prone and result 1:0 if the
    expression on the right hand side is zero. Use seq_len() or seq_along()
    instead.

    R\getdata.R:111:32
    R\getdata.R:174:26
    R\getdata.R:200:25

I think that you could easily increase code coverage, looking at https://codecov.io/gh/pachamaltese/oec/src/master/R/getdata.R you should

Thanks for all your work!

pachadotdev commented 6 years ago

Great, thanks ! I'll do that before 14.00 And sorry I didn't notice I broke links when renaming

On Sun, Jun 10, 2018, 8:26 AM Maëlle Salmon notifications@github.com wrote:

Thanks again @pachamaltese https://github.com/pachamaltese!

A few more points before I start looking for reviewers

-

The URL in the repo description should be updated.

The documentation website has to be updated (it still documents the function that was removed)

The logo doesn't appear any more in the README.

You can now remove my feedback from the README 😉 and have a more prominent link to the pkgdown website in it.

You can now add a peer-review badge to your README

It'll indicate the package is under review, until after onboarding when it'll then indicate it was peer-reviewed.

-

Please add a code of conduct to your repo, e.g. via usethis::use_code_of_conduct()

Rbuildignore data-raw e.g. via usethis::use_build_ignore("data-raw")

The codecov badge has a wrong URL.

goodpractice output

It is good practice to

✖ write unit tests for all functions, and all package code in general. 89% of code lines are covered by test cases.

R/getdata.R:44:NA
R/getdata.R:47:NA
R/getdata.R:50:NA
R/getdata.R:55:NA
R/getdata.R:59:NA
... and 13 more lines

✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when you perform R CMD build on it. ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters

R\getdata.R:47:1
R\getdata.R:50:1
R\getdata.R:59:1
R\getdata.R:64:1
R\getdata.R:69:1
... and 5 more lines

✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.

R\getdata.R:111:32
R\getdata.R:174:26
R\getdata.R:200:25

I think that you could easily increase code coverage, looking at https://codecov.io/gh/pachamaltese/oec/src/master/R/getdata.R you should

-

test that the function returns errors when it should

test more use cases (e.g. when destination == "all")

Thanks for all your work!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-396045176, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6Oa6i5RC86vR5agDIBsMCxfApsP9-ks5t7RB6gaJpZM4T-I-i .

maelle commented 6 years ago

You're welcome, no hurry and no problem!

maelle commented 6 years ago

Note that only one vignette appeared on the pkgdown website last time I looked

pachadotdev commented 6 years ago

@maelle ready, I used Jim Hester's lintr package for this 2nd round of feedback. Thanks a lot for the useful comments !!

maelle commented 6 years ago

Many thanks, I'll now start looking for reviewers!

maelle commented 6 years ago

Thanks @aedobbyn and @cimentadaj for accepting to review this package! 😸

Your reviews are due on 2018-07-04. As a reminder here are links to our bookdown in particular the packaging guidelines and reviewer's guide and to the review template.

aedobbyn commented 6 years ago

Thanks @maelle and good to meet you @pachamaltese! This package looks very useful (and I love the hex sticker!)

I want to make sure I've got the right version of the package. I installed the development version but running any of the getdata examples produces the same error in what looks to be jsonlite:::parse_con.

Let me know if this is fixed in another branch and I'll re-install from there. My session info is in the reprex below.

library(oec)

getdata("chl", "chn", 2015)
#> Valid input. Processing 2015 data...
#> Error in parse_con(txt, bigint_as_char): parse error: premature EOF
#>                                        
#>                      (right here) ------^

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Chicago             
#>  date     2018-06-18
#> Packages -----------------------------------------------------------------
#>  package    * version    date      
#>  assertthat   0.2.0      2017-04-11
#>  backports    1.1.2      2017-12-13
#>  base       * 3.4.4      2018-03-15
#>  bindr        0.1.1      2018-03-13
#>  bindrcpp     0.2.2      2018-03-29
#>  compiler     3.4.4      2018-03-15
#>  curl         3.2        2018-03-28
#>  datasets   * 3.4.4      2018-03-15
#>  devtools     1.13.5     2018-02-18
#>  digest       0.6.15     2018-01-28
#>  dplyr        0.7.5      2018-05-19
#>  evaluate     0.10.1     2017-06-24
#>  glue         1.2.0.9000 2018-06-18
#>  graphics   * 3.4.4      2018-03-15
#>  grDevices  * 3.4.4      2018-03-15
#>  htmltools    0.3.6      2017-04-28
#>  jsonlite     1.5        2017-06-01
#>  knitr        1.20       2018-02-20
#>  magrittr     1.5        2014-11-22
#>  memoise      1.1.0      2017-04-21
#>  methods    * 3.4.4      2018-03-15
#>  oec        * 2.8.1      2018-06-18
#>  pillar       1.2.1      2018-02-27
#>  pkgconfig    2.0.1      2017-03-21
#>  purrr        0.2.5      2018-05-29
#>  R6           2.2.2      2017-06-17
#>  Rcpp         0.12.17    2018-05-18
#>  rlang        0.2.1      2018-05-30
#>  rmarkdown    1.9        2018-03-01
#>  rprojroot    1.3-2      2018-01-03
#>  stats      * 3.4.4      2018-03-15
#>  stringi      1.2.2      2018-05-02
#>  stringr      1.3.1      2018-05-10
#>  tibble       1.4.2      2018-01-22
#>  tidyselect   0.2.4      2018-02-26
#>  tools        3.4.4      2018-03-15
#>  utils      * 3.4.4      2018-03-15
#>  withr        2.1.2      2018-03-15
#>  yaml         2.1.18     2018-03-08
#>  source                                     
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  cran (@0.7.5)                              
#>  CRAN (R 3.4.1)                             
#>  Github (tidyverse/glue@a2c0f8b)            
#>  local                                      
#>  local                                      
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.0)                             
#>  CRAN (R 3.4.0)                             
#>  local                                      
#>  Github (pachamaltese/oec-r-package@00a367b)
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.0)                             
#>  cran (@0.2.5)                              
#>  CRAN (R 3.4.0)                             
#>  cran (@0.12.17)                            
#>  cran (@0.2.1)                              
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  cran (@1.2.2)                              
#>  cran (@1.3.1)                              
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)
pachadotdev commented 6 years ago

@aedobbyn Hi. Nice to meet u too! Thanks for letting me know. I installed R on another machine to try to reproduce this and ... it looks as the OEC has a problem

I should explore this, the problem can me an MIT server problem at the moment, but this is problem as the package just verifies internet connection before trying to get data and does not inspect additional things.

For instance, if I visit https://atlas.media.mit.edu/sitc/export/2010/chl/arg/show/ I get this screenshot from 2018-06-18 13-44-24

pachadotdev commented 6 years ago

@aedobbyn update: somebody within the OEC team screwed something and not all origins/destinations are available at the moment... the US data is working ok but no cigar trying to compute RCAs that need world flows :(

I shall update when this is fixed

screenshot from 2018-06-18 13-50-42

aedobbyn commented 6 years ago

Great, thanks for the quick update! Good opportunity to provide informative error messages to the user when the server is down or, in the case of the URL above, nothing is returned at all (I don't even get a 500 in the Chrome console). I'll look out for another update from you!

maelle commented 6 years ago

To test (in unit tests) how the package react to http errors you can use @sckott's webmockr package https://github.com/ropensci/webmockr

pachadotdev commented 6 years ago

@maelle: Thanks a again !! at this point you have contributed more to this package than the actual listed ctbs @aedobbyn: Thanks, now it's fixed. This was a server side problem that I should include as an error message. I tested both R and browser access to the API and it's working again :) screenshot from 2018-06-19 09-08-05 screenshot from 2018-06-19 09-07-13

aedobbyn commented 6 years ago

Hmm I still get the same parse error: premature EOF even though that endpoint in the browser now displays data. @cimentadaj are you able to receive data?

cimentadaj commented 6 years ago

Hi @maelle, @aedobbyn and @pachamaltese ! Glad to review this package. I'll submit my review this weekend, but in the mean time, the development version of the package (devtools::install_github("pachamaltese/oec-r-package")) seems to work for me:

library(oec)
getdata("chl", "chn", 2015)
#> Valid input. Processing 2015 data...
#> # A tibble: 670 x 31
#>     year origin_id destination_id origin_name destination_name id   
#>    <dbl> <chr>     <chr>          <chr>       <chr>            <chr>
#>  1  2015 chl       chn            Chile       China            5722 
#>  2  2015 chl       chn            Chile       China            5838 
#>  3  2015 chl       chn            Chile       China            6251 
#>  4  2015 chl       chn            Chile       China            6282 
#>  5  2015 chl       chn            Chile       China            6289 
#>  6  2015 chl       chn            Chile       China            6352 
#>  7  2015 chl       chn            Chile       China            6418 
#>  8  2015 chl       chn            Chile       China            6572 
#>  9  2015 chl       chn            Chile       China            6577 
#> 10  2015 chl       chn            Chile       China            6631 
#> # ... with 660 more rows, and 25 more variables: id_len <int>,
#> #   product_name <chr>, group_id <chr>, group_name <chr>,
#> #   export_val <dbl>, export_val_growth_pct <dbl>,
#> #   export_val_growth_pct_5 <dbl>, export_val_growth_val <dbl>,
#> #   export_val_growth_val_5 <dbl>, export_rca <dbl>, import_val <dbl>,
#> #   import_val_growth_pct <dbl>, import_val_growth_pct_5 <dbl>,
#> #   import_val_growth_val <dbl>, import_val_growth_val_5 <dbl>,
#> #   import_rca <dbl>, trade_exchange_val <dbl>, pci <dbl>, pci_rank <dbl>,
#> #   pci_rank_delta <dbl>, top_exporter_code <chr>,
#> #   top_importer_code <chr>, top_importer <chr>, top_exporter <chr>,
#> #   color <chr>

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  tz       Europe/Paris                
#>  date     2018-06-20
#> Packages -----------------------------------------------------------------
#>  package    * version    date      
#>  assertthat   0.2.0      2017-04-11
#>  backports    1.1.2      2017-12-13
#>  base       * 3.4.4      2018-03-15
#>  bindr        0.1.1      2018-03-13
#>  bindrcpp   * 0.2.2      2018-03-29
#>  cli          1.0.0      2017-11-05
#>  compiler     3.4.4      2018-03-15
#>  crayon       1.3.4      2017-09-16
#>  curl         3.2        2018-03-28
#>  datasets   * 3.4.4      2018-03-15
#>  devtools     1.13.5     2018-02-18
#>  digest       0.6.15     2018-01-28
#>  dplyr        0.7.5      2018-05-19
#>  evaluate     0.10.1     2017-06-24
#>  formatR      1.5        2017-04-25
#>  glue         1.2.0.9000 2018-06-19
#>  graphics   * 3.4.4      2018-03-15
#>  grDevices  * 3.4.4      2018-03-15
#>  htmltools    0.3.6      2017-04-28
#>  jsonlite     1.5        2017-06-01
#>  knitr        1.20       2018-02-20
#>  magrittr     1.5        2014-11-22
#>  memoise      1.1.0      2018-03-20
#>  methods    * 3.4.4      2018-03-15
#>  oec        * 2.8.1      2018-06-20
#>  pillar       1.2.2      2018-04-26
#>  pkgconfig    2.0.1      2017-03-21
#>  purrr        0.2.5      2018-05-29
#>  R6           2.2.2      2017-06-17
#>  Rcpp         0.12.17    2018-05-18
#>  rlang        0.2.0.9001 2018-06-20
#>  rmarkdown    1.9        2018-03-01
#>  rprojroot    1.3-2      2018-01-03
#>  stats      * 3.4.4      2018-03-15
#>  stringi      1.1.7      2018-03-12
#>  stringr      1.3.1      2018-05-10
#>  tibble       1.4.2      2018-01-22
#>  tidyselect   0.2.4      2018-02-26
#>  tools        3.4.4      2018-03-15
#>  utf8         1.1.3      2018-01-03
#>  utils      * 3.4.4      2018-03-15
#>  withr        2.1.2      2018-03-18
#>  yaml         2.1.18     2018-03-08
#>  source                                     
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.3)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.3)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  Github (tidyverse/glue@a2c0f8b)            
#>  local                                      
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  Github (hadley/memoise@06d16ec)            
#>  local                                      
#>  Github (pachamaltese/oec-r-package@00a367b)
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  Github (r-lib/rlang@ba4fb06)               
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  CRAN (R 3.4.4)                             
#>  local                                      
#>  Github (r-lib/withr@79d7b0d)               
#>  CRAN (R 3.4.4)
pachadotdev commented 6 years ago

Hi @maelle, @aedobbyn and @cimentadaj I added some error messages to inform the user when the wifi drops or MIT server is down/with problems. Last night I had the same problem (server side) and the browser returned an empty json, then 30 mins later I could obtain the data. I'll tell Media Lab people about this, but in the meanwhile an error message "patches" server side issues :) And yes, devtools::install_github("pachamaltese/oec") install the current version of the package that you are reviewing.

aedobbyn commented 6 years ago

Great, working for me too now!

Just so I know the intent @pachamaltese, you quote variables from the OEC and use sym and !! to unquote them to avoid the NOTE no visible binding for global variable {variable} correct?

A version with the tidyeval stripped out passes all tests without warnings so I imagine the NOTE is the only reason.

pachadotdev commented 6 years ago

@aedobbyn exactly!! nobody ever before has discovered the main reason why I do that :D

aedobbyn commented 6 years ago

Totally understand, those NOTEs are annoying :relaxed:

As you probably know, those NOTEs shouldn't stop you from passing a CRAN check, especially if you document them in an . Rbuildignored cran-comments.md. Might be cleaner to do that and avoid tidyeval altogether, but I would understand if you want to leave the tidyeval in.

aedobbyn commented 6 years ago

@maelle makes the excellent point that you can let document these global variables in a globalVariables.R file like this to avoid the no visible binding for global variable NOTEs. (Also see, e.g., an example of it in use in her opencage package.)

If you devtools::check() the oec code in that amanda branch on my fork you won't get any of those NOTEs and you won't need to use the sym + !! syntax -- let me know if you want me to shoot you a PR with the code from that fork :)

pachadotdev commented 6 years ago

@aedobbyn thanks! I accept that if and only if you agree to be added as ctb because that is a contribution, a good contribution to be precise :)

aedobbyn commented 6 years ago

Haha that's very generous! Its' @maelle who deserves the credit, as usual 😆

cimentadaj commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 1.3

Review Comments

The oec package is well written and definitely fulfills the objective of downloading trade date easily into R. I have comments which serve to improve the code, the user experience and the readibility of the code. Some are more immediate then others so feel free to ignore the small ones or new features if time is of essence. However, I would consider them for future releases. They are not in any particular order of importance.

get_resp <- function(url, attempts_left = 5) {

  stopifnot(attempts_left > 0)

  resp <- httr::GET(url)

  # On a successful GET, return the response
  if (httr::status_code(resp) == 200) {
    resp
  } else if (attempts_left == 1) { # When attempts run out, stop with an error
    stop("Cannot connect to the MIT API")
  } else { # Otherwise, sleep a second and try again
    Sys.sleep(1)
    get_resp(url, attempts_left - 1)
  }
}

Feel free to adapt it to using fromJSON and remove the httr code.

get_data <- function(all the arguments) {
 all code here....
}

getdata <- function(same arguments) {
get_data(pass the arguments)
}

To avoid repetition.

library(wbstats)

unemploy_vars <- wbsearch(pattern = "unemployment")
head(unemploy_vars)

Check https://cran.r-project.org/web/packages/wbstats/README.html for more examples.

You could have a generic get_product that searches in all product tibbles and returns an aggregated tibble with all matches. For example if we search for Antiques then we get the same result from oec::hs02 but also a new column with indicating that it comes from hs02. This way when you have similar matches then the user can disambiguate where it comes from.

Of course this is a lot of work, but here's a very quick and crude approach:

get_products <- function(name) {
  # Grab all the names of all hs datasets
  all_datastr <-
    stringr::str_subset(
      data(package = "oec")$results[, "Item"],
      "^hs"
    )

  # get the datasets, create the type_product column, bind them all together
  # and do the search
  all_datastr %>% 
    purrr::map(get) %>% 
    purrr::map2_dfr(all_datastr, ~ {.x$type_product <- .y; .x}) %>% 
    dplyr::filter(stringr::str_detect(product_name, name))
}

get_products("Gold")
get_products("Animals")

Misc:

In the oec website and Github repo the installation code chunk still has the old repo (devtools::install_github("pachamaltese/oec-r-package"). In fact, try doing an overall repo search and update all URL's. The DESCRIPTION file needs to be updated in the BugReports field as well.

I like that you're very defensive in what you accept in the function but try to help the user in pointing out where something went wront. For example, here I will paste what the user wrote and when appropriate (not here because there's too many countries), list the available options the user could've chosen.

Great package!

pachadotdev commented 6 years ago

Whoa. Thanks a lot. This helps to improve the package. "Get_data" was provided by the other reviewers as an alternative without tidy evaluation. I'm from cellphone now so I'm eager to implement changes later!

El sáb., jun. 23, 2018 5:59 PM, Jorge Cimentada notifications@github.com escribió:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • [x ] A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • [x ] Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements http://joss.theoj.org/about#paper_structure with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 1.5 Review Comments

The oec package is well written and definitely fulfills the objective of downloading trade date easily into R. I have comments which serve to improve the code, the user experience and the readibility of the code. Some are more immediate then others so feel free to ignore the small ones or new features if time is of essence. However, I would consider them for future releases. They are not in any particular order of importance.

-

Great opportunity to use a switch statement here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L112-L130 .

I would use match.args here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L62-L64

I'm not sure whether curl retries calling the API if it doesn't respond, but if it doesn't, then you could retry 3-5 times like here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L142-L144. This would be very handy because you call the API several times in other links. Considering that the MIT server has been a bit unstable in the last few days then this could come in handy. I think there's a few packages to do this properly but if you want to avoid adding more dependencies, a function hack is this one using httr:

get_resp <- function(url, attempts_left = 5) {

stopifnot(attempts_left > 0)

resp <- httr::GET(url)

On a successful GET, return the response

if (httr::status_code(resp) == 200) { resp } else if (attempts_left == 1) { # When attempts run out, stop with an error stop("Cannot connect to the MIT API") } else { # Otherwise, sleep a second and try again Sys.sleep(1) get_resp(url, attempts_left - 1) } }

Feel free to adapt it to using fromJSON and remove the httr code.

-

Maybe I'm a bit ignorant but in this specific case https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L146-L149, since you already checked that the user has internet at the beginning of the function, shouldn't this be only an MIT problem? Perhaps this error check can be wrapped into a function because it's used 2-3 times.

I see that you create several functions called read_fromapi** to create the URL of the API. It would be cleaner and safer to create a single function that is modular to all API urls and you provide different arguments depending on the specific url.

Any particular reason why you have get_data and getdata? Seems a bit redundant and not a clear grammatical reason to have two (for example, color and colour in ggplot2). In any case, because they share exactly the same documentation I would suggest documenting them together into a single page using #' @rdname get_data. See http://r-pkgs.had.co.nz/man.html. In fact, because they're the same exact function, if you want to keep both, then I would do something like this:

get_data <- function(all the arguments) { all code here.... } getdata <- function(same arguments) { get_data(pass the arguments) }

To avoid repetition.

-

This is clearly more work, but I think it might be worth it for new users. Instead of relying on your user to search for a country in the 264 country names to look for a code, you could create a function that regexps the name of the country and returns the country code. For example, get_countrycode("Chile") or get_countrycode("chile") should yield the same. This can be piped into get_data more intuitively like get_data(get_countrycode("Chile"), get_countrycode("China"), 2010, 2015). Alternatively, the user can search the country tibble to search manually. But note that non-frequent R users will probably search manually and not with regexps.

On the same line, you could do something similar for all product tibbles. I would take a similar approach to the way the World Bank R package does it, wbstats. For example

library(wbstats) unemploy_vars <- wbsearch(pattern = "unemployment") head(unemploy_vars)

Check https://cran.r-project.org/web/packages/wbstats/README.html for more examples.

You could have a generic get_product that searches in all product tibbles and returns an aggregated tibble with all matches. For example if we search for Antiques then we get the same result from oec::hs02 but also a new column with indicating that it comes from hs02. This way when you have similar matches then the user can disambiguate where it comes from.

Of course this is a lot of work, but here's a very quick and crude approach:

get_products <- function(name) {

Grab all the names of all hs datasets

all_datastr <- stringr::str_subset( data(package = "oec")$results[, "Item"], "^hs" )

get the datasets, create the type_product column, bind them all together

and do the search

all_datastr %>% purrr::map(get) %>% purrr::map2_dfr(all_datastr, ~ {.x$type_product <- .y; .x}) %>% dplyr::filter(stringr::str_detect(product_name, name)) }

get_products("Gold") get_products("Animals")

Misc:

In the oec website and Github repo the installation code chunk still has the old repo (devtools::install_github("pachamaltese/oec-r-package"). In fact, try doing an overall repo search and update all URL's. The DESCRIPTION file needs to be updated in the BugReports field as well.

I like that you're very defensive in what you accept in the function but try to help the user in pointing out where something went wront. For example, here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L47-L51 I will paste what the user wrote and when appropriate (not here because there's too many countries), list the available options the user could've chosen.

Great package!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-399710318, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OedP8zut_pby9oJ6ToxzUKfxXoTEks5t_qwigaJpZM4T-I-i .

pachadotdev commented 6 years ago

@cimentadaj thanks a lot!! I have limited laptop access at the moment but I saw this and there are really good suggestions. Can I add you as a contributor? I never thought about matching arguments.

aedobbyn commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing:

2

Review Comments

First off, I'm really excited for this package and the data it's going to make available to people in tidy format.

I think the modifications you've put in place so far and the ones suggested by @cimentadaj are all very useful. (Many of those I was going to suggest myself, but I won't repeat Jorge!) +1 especially to trying the server multiple times (you might even implement exponential backoff) and making things a bit more concise with match.args and switch.

A few other notes:

I'll note that supplying "all" for both origin and destination does fail at the moment because during the join trade flows step you end up with top_importer.x, top_importer.y, etc. so operations expecting a single top_importer error out. So this case will need a bit of special treatment.

library(oec)
getdata("all", "all", 2015)
#> Valid input. Processing 2015 data...
#> Error in .f(.x[[i]], ...): object 'top_importer' not found

I think that's it from me! Looking forward to seeing this package get some good use!

pachadotdev commented 6 years ago

Whoa! Many thanks. I'm still from cellphone but THIS "it may even be worth setting "all" as the default value for origin and destination" is where I'm starting. Thanks a lot!

On Sun, Jun 24, 2018, 8:33 PM Amanda Dobbyn notifications@github.com wrote:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements http://joss.theoj.org/about#paper_structure with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

2 Review Comments

First off, I'm really excited for this package and the data it's going to make available to people in tidy format.

I think the modifications you've put in place so far and the ones suggested by @cimentadaj https://github.com/cimentadaj are all very useful. (Many of those I was going to suggest myself, but I won't repeat Jorge!) +1 especially to trying the server multiple times (you might even implement exponential backoff) and making things a bit more concise with match.args and switch.

A few other notes:

-

An alternative to @cimentadaj https://github.com/cimentadaj's suggestion to allow users to get_countrycode using a regex match would be to allow users to supply either the country_code or the country in get_data; so get_data("chl", "chn", 2015, "sitc") would return the same data as get_data("Chile", "China", 2015, "sitc"). While this approach gives the user less leeway in the input they can supply than a regex would, I think most of the countries in country_codes$country are represented the way most people would expect, which saves some work on your end.

It looks like you allow the user to supply "all" as origin and destination https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L179, but this isn't reflected in the documentation. Beyond simply mentioning that this is an option, it may even be worth setting "all" as the default value for origin and destination so that the user can get up and running with the function without looking through country codes.

I'll note that supplying "all" for both origin and destination does fail at the moment because during the join trade flows step https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L278 you end up with top_importer.x, top_importer.y, etc. so operations expecting a single top_importer error out. So this case will need a bit of special treatment.

library(oec) getdata("all", "all", 2015)

> Valid input. Processing 2015 data...

> Error in .f(.x[[i]], ...): object 'top_importer' not found

-

The docs say https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L10 that classification defaults to "sitc", but that doesn't seem to be the case https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L41 at the moment. I'm all in favor of more defaults for the function, so users can get up and running more quickly!

There are some quick wins to increase test coverage https://codecov.io/gh/pachamaltese/oec/src/6da195837d28fb9638f89e466fd9a33816dca6f8/R/get_data.R that would make the package more stable in the long term if the API data availability changes. I'd add tests for all the different classification types; that way if something happens to one of them down the road, your test will fail and Travis will alert you. Currently there is only a test for classification = "sitc".

I'd also suggest using webmockr to in the tests as well, as @maelle mentioned https://github.com/ropensci/onboarding/issues/217#issuecomment-398150814

The styling generally looks good but I'd recommend a styler::style_pkg() just to catch the remaining small things.

I would consider changing oec_r_package.Rproj to oec.Rproj for consistency (and I second @cimentadaj https://github.com/cimentadaj's point that it's good to do an entire sweep of the package to make sure that there are no more instances of oec_r_package or oec-r-package left).

I believe only the .Rmd needs to go in the vignettes directory. I like that you show in a vignette how you went about tidying the data that the user finds in the package.

I would add one or two examples of get_data to the readme or the pkgdown site so that people don't have to find the vignette or go to the docs to get a preview of the package. (Makes it more likely for people to download the package instead of clicking away, imo).

You may already be in the process of doing the full switchover from getdata to get_data but just to make sure -- anywhere where getdata was previously used e.g. in the vignettes, examples etc. should be replaced with get_data and of course the getdata function and its test file can be deleted.

You might consider documenting what each of these fields returned from the API mean. If there isn't a good link to point people to in the @return, you could include a tibble in the package that the user can access with something like variable_definitions.

I think that's it from me! Looking forward to seeing this package get some good use!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-399801749, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OQ64kkf5lWbay5-TR1i1FX2nIdD0ks5uAC_fgaJpZM4T-I-i .

aedobbyn commented 6 years ago

Awesome, looking forward to seeing the v2!

maelle commented 6 years ago

Many thanks for your reviews @aedobbyn and @cimentadaj, they seem very thorough and useful! 😺

I learnt about the global variables "trick" myself here 😉

cimentadaj commented 6 years ago

+1 for @aedobbyn and @maelle comments! Good job @pachamaltese, looking forward to the next version! Yes, of course, feel free to include me as ctb.

maelle commented 6 years ago

By the way you can add reviewers as ctb but also as rev cf https://ropensci.github.io/dev_guide/building.html#authorship

pachadotdev commented 6 years ago

thanks a lot @maelle, may I submit a new version next week?

maelle commented 6 years ago

Of course! As per our guide for authors "we ask that you respond to reviewers’ comments within 2 weeks of the last-submitted review, but you may make updates to your package or respond at any time.". 😺

pachadotdev commented 6 years ago

thanks!!! this week I'm useless, just reproducible reports at work, but next sat I'll retake this :) have a great week