Closed pachadotdev closed 5 years ago
@pachamaltese Thanks for your submission! From a quick look you checked "contains a vignette with examples of its essential functions and uses." and "has a test suite." but I do not see neither a vignette nor a test suite in the GitHub repo? We can put this submission on hold while you add them, and we can provide links about e.g. unit testing if necessary.
Also, could you please expand here on the added value of the data processing offered by your package as opposed to using the original data sources?
@maelle thanks, no worries, let me check my laptop hdd vs my external drive, I probably forgot to move something
About the added value if I run getdata_batch("chl", "chn", 2010, 2015)
I will obtain a tibble with 6 years of data for chilean exports to china but with these additions:
On the other hand, with API calls I would need to read https://atlas.media.mit.edu/hs92/export/2010/chl/chn/show/ https://atlas.media.mit.edu/hs92/export/2011/chl/chn/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/chl/chn/show/
and also to read this https://atlas.media.mit.edu/hs92/export/2010/chl/all/show/ https://atlas.media.mit.edu/hs92/export/2011/chl/all/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/chl/all/show/
and also this https://atlas.media.mit.edu/hs92/export/2010/all/all/show/ https://atlas.media.mit.edu/hs92/export/2011/all/all/show/ ... etc ... https://atlas.media.mit.edu/hs92/export/2015/all/all/show/
After reading those 15 json files the user would need some additional operations to see the same result as getdata_batch("chl", "chn", 2010, 2015)
:
left_join
to match API product code (e.g. 2709
) to product name (e.g. petroleum), a mayor drawback here is that the code-name table can't be accessed by using the APIleft_join
to match country codes (e.g. sachl
) to country names (e.g. Chile) with the big problem that the API returns non-ISO codes (e.g. sachl
vs chl
), again the user would need a code-name table and would need to crop the stringspurrr
to create a unique tibbleHi @maelle. I did put all the pieces together: external disks, google drive, etc. I have pushed a new version with lots of changes as well as a testing unit. Have a good weekend!
Hi @pachamaltese! Thanks.
I've been traveling so my initial checks will take a bit longer than expected.
I am not sure I understood how your package got scattered over hard drives &co but please ask us any git/GitHub question you might have. See also http://happygitwithr.com ☺
thanks a lot @maelle
happy travelling!!
I have +5 external hdd since my macbook fried itself and I lost tons of tidy useful data :( I'm still recovering from that accident ;)
Thanks again for your submission @pachamaltese!
Here are a some points that should be solved/discussed before I start looking for reviewers. Note that now that you have everything on GitHub, you shouldn't loose pieces again, which is good news. Sorry to hear about your Macbook! 😢
This process is an ongoing discussion so ask any question/make any remark you might have about my comments!
You use assign
in functions instead of returning output. Please return output, you don't need to use assign
here.
Why do you use rm
inside a function?
There is a separate file containing imports (confusingly called exports :wink:). It'd be easier to see where functions are imported from if you used the syntax jsonlite::fromJSON
. You could then remove the separate R code with imports. Cf http://r-pkgs.had.co.nz/namespace.html#imports
Your package depends on purrr
for a single function. Consider replacing it with do.call
and rbind
.
Regarding all the stopifnot
, have informative error messages when the user entered a wrong parameter value.
You could have a single exported function instead of two if I follow correctly? It'd take arguments like the batch one, that could be vector of length 1 (the user would input say one or several country pairs). The other function would be an utility function, that wouldn't need to be exported. It might simplify the interface?
If there is only one function call in a pipeline it might be clearer to remove the pipe.
origin_destination <- origin_destination %>%
mutate(trade_exchange_val =
!!sym("export_val") + !!sym("import_val"))
would become
origin_destination <- mutate(origin_destination,
trade_exchange_val = !!sym("export_val") + !!sym("import_val"))
countrycode
package.Where do the data files in data come from? Why are they pre-prepared, are you sure they won't be updated?
You could keep their data processing in data-raw.
Please add the URL to the data source.
Are you allowed to use the logo in your README?
A flowchart of what processing your package does might be useful to explain the added value of using the package and to explain what has been preprocessed or not (what's being downloaded vs what's in data/).
In the README when mentioning the vignette have a link to the pkgdown
website.
Please add @return fields to the documentation of the functions.
A vignette with an actual use case, and visualizations, would be nice (or an extension of the existing vignette).
Please explain what the data is e.g. SITC or add an external link.
Because some chunks of the vignette are unevaluated without a pre-generated output, I can't understand what the "problem" is. You could pre-generate the output.
Could you rename the repo "oec" like the package?
The "Maintainer" field in DESCRIPTION is useless because it'll be generated automatically from Authors.
What are the files .Rapp.history in data/ and .build.timestamp in vignettes/?
Please add a code coverage integration and badge (via e.g. usethis::usethis::use_coverage()
)
✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
often. A build date will be added to the package when you perform `R CMD build` on
it.
✖ avoid long code lines, it is bad for readability. Also, many people
prefer editor windows that are about 80 characters wide. Try make your lines
shorter than 80 characters
R\getdata.R:56:1
R\getdata_batch.R:44:1
R\getdata_batch.R:68:1
tests\testthat\test-oec.R:3:1
tests\testthat\test-oec.R:15:1
... and 10 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on
the right hand side is zero. Use seq_len() or seq_along() instead.
R\getdata_batch.R:66:13
Reviewers: @aedobbyn @cimentadaj Due date: 2018-07-04
@pachamaltese :wave: any question or update? 😺
@maelle not yet, I've been moving a lot. Can I send it tomorrow at night?
If you think you won't have enough time to work on the package in the next weeks, we can set this submission on hold until you have more time.
tomorrow's night is ok, I have to complete the vignettes and that takes 1-2 hours :)
@maelle it's ok now, I did all the steps from your feedback and just a few are explained why yes/no to it is better for this particular API.
Thanks, can you please answer with the explanations in this thread? 🙂
origin_destination <- origin_destination %>%
mutate(trade_exchange_val =
!!sym("export_val") + !!sym("import_val"))
would become
origin_destination <- mutate(origin_destination,
trade_exchange_val = !!sym("export_val") + !!sym("import_val"))
Thanks a lot @pachamaltese! 😺 I started looking and two small comments already:
The URL to the pkgdown
site should be updated now that the repo has a new name.
You don't export the batch function any more but it's still present in the website docs at least cf http://pacha.hk/oec/reference/index.html
The link to the logo is also broken at the moment.
Thanks again @pachamaltese!
A few more points before I start looking for reviewers
The URL in the repo description should be updated.
The documentation website has to be updated (it still documents the function that was removed)
The logo doesn't appear any more in the README.
You can now remove my feedback from the README :wink: and have a more prominent link to the pkgdown website in it.
You can now add a peer-review badge to your README
[![](https://badges.ropensci.org/217_status.svg)](https://github.com/ropensci/onboarding/issues/217)
It'll indicate the package is under review, until after onboarding when it'll then indicate it was peer-reviewed.
Please add a code of conduct to your repo, e.g. via usethis::use_code_of_conduct()
Rbuildignore data-raw e.g. via usethis::use_build_ignore("data-raw")
The codecov badge has a wrong URL.
goodpractice output
It is good practice to
✖ write unit tests for all functions, and all package code in
general. 89% of code lines are covered by test cases.
R/getdata.R:44:NA
R/getdata.R:47:NA
R/getdata.R:50:NA
R/getdata.R:55:NA
R/getdata.R:59:NA
... and 13 more lines
✖ omit "Date" in DESCRIPTION. It is not required and it gets
invalid quite often. A build date will be added to the package when you
perform `R CMD build` on it.
✖ avoid long code lines, it is bad for readability. Also, many
people prefer editor windows that are about 80 characters wide. Try make
your lines shorter than 80 characters
R\getdata.R:47:1
R\getdata.R:50:1
R\getdata.R:59:1
R\getdata.R:64:1
R\getdata.R:69:1
... and 5 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and
1:NCOL(...) expressions. They are error prone and result 1:0 if the
expression on the right hand side is zero. Use seq_len() or seq_along()
instead.
R\getdata.R:111:32
R\getdata.R:174:26
R\getdata.R:200:25
I think that you could easily increase code coverage, looking at https://codecov.io/gh/pachamaltese/oec/src/master/R/getdata.R you should
test that the function returns errors when it should
test more use cases (e.g. when destination == "all"
)
Thanks for all your work!
Great, thanks ! I'll do that before 14.00 And sorry I didn't notice I broke links when renaming
On Sun, Jun 10, 2018, 8:26 AM Maëlle Salmon notifications@github.com wrote:
Thanks again @pachamaltese https://github.com/pachamaltese!
A few more points before I start looking for reviewers
-
The URL in the repo description should be updated.
The documentation website has to be updated (it still documents the function that was removed)
The logo doesn't appear any more in the README.
You can now remove my feedback from the README 😉 and have a more prominent link to the pkgdown website in it.
You can now add a peer-review badge to your README
It'll indicate the package is under review, until after onboarding when it'll then indicate it was peer-reviewed.
-
Please add a code of conduct to your repo, e.g. via usethis::use_code_of_conduct()
Rbuildignore data-raw e.g. via usethis::use_build_ignore("data-raw")
The codecov badge has a wrong URL.
goodpractice output
It is good practice to
✖ write unit tests for all functions, and all package code in general. 89% of code lines are covered by test cases.
R/getdata.R:44:NA R/getdata.R:47:NA R/getdata.R:50:NA R/getdata.R:55:NA R/getdata.R:59:NA ... and 13 more lines
✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when you perform
R CMD build
on it. ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 charactersR\getdata.R:47:1 R\getdata.R:50:1 R\getdata.R:59:1 R\getdata.R:64:1 R\getdata.R:69:1 ... and 5 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right hand side is zero. Use seq_len() or seq_along() instead.
R\getdata.R:111:32 R\getdata.R:174:26 R\getdata.R:200:25
I think that you could easily increase code coverage, looking at https://codecov.io/gh/pachamaltese/oec/src/master/R/getdata.R you should
-
test that the function returns errors when it should
test more use cases (e.g. when destination == "all")
Thanks for all your work!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-396045176, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6Oa6i5RC86vR5agDIBsMCxfApsP9-ks5t7RB6gaJpZM4T-I-i .
You're welcome, no hurry and no problem!
Note that only one vignette appeared on the pkgdown website last time I looked
@maelle ready, I used Jim Hester's lintr
package for this 2nd round of feedback. Thanks a lot for the useful comments !!
Many thanks, I'll now start looking for reviewers!
Thanks @aedobbyn and @cimentadaj for accepting to review this package! 😸
Your reviews are due on 2018-07-04. As a reminder here are links to our bookdown in particular the packaging guidelines and reviewer's guide and to the review template.
Thanks @maelle and good to meet you @pachamaltese! This package looks very useful (and I love the hex sticker!)
I want to make sure I've got the right version of the package. I installed the development version but running any of the getdata
examples produces the same error in what looks to be jsonlite:::parse_con
.
Let me know if this is fixed in another branch and I'll re-install from there. My session info is in the reprex below.
library(oec)
getdata("chl", "chn", 2015)
#> Valid input. Processing 2015 data...
#> Error in parse_con(txt, bigint_as_char): parse error: premature EOF
#>
#> (right here) ------^
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.4 (2018-03-15)
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz America/Chicago
#> date 2018-06-18
#> Packages -----------------------------------------------------------------
#> package * version date
#> assertthat 0.2.0 2017-04-11
#> backports 1.1.2 2017-12-13
#> base * 3.4.4 2018-03-15
#> bindr 0.1.1 2018-03-13
#> bindrcpp 0.2.2 2018-03-29
#> compiler 3.4.4 2018-03-15
#> curl 3.2 2018-03-28
#> datasets * 3.4.4 2018-03-15
#> devtools 1.13.5 2018-02-18
#> digest 0.6.15 2018-01-28
#> dplyr 0.7.5 2018-05-19
#> evaluate 0.10.1 2017-06-24
#> glue 1.2.0.9000 2018-06-18
#> graphics * 3.4.4 2018-03-15
#> grDevices * 3.4.4 2018-03-15
#> htmltools 0.3.6 2017-04-28
#> jsonlite 1.5 2017-06-01
#> knitr 1.20 2018-02-20
#> magrittr 1.5 2014-11-22
#> memoise 1.1.0 2017-04-21
#> methods * 3.4.4 2018-03-15
#> oec * 2.8.1 2018-06-18
#> pillar 1.2.1 2018-02-27
#> pkgconfig 2.0.1 2017-03-21
#> purrr 0.2.5 2018-05-29
#> R6 2.2.2 2017-06-17
#> Rcpp 0.12.17 2018-05-18
#> rlang 0.2.1 2018-05-30
#> rmarkdown 1.9 2018-03-01
#> rprojroot 1.3-2 2018-01-03
#> stats * 3.4.4 2018-03-15
#> stringi 1.2.2 2018-05-02
#> stringr 1.3.1 2018-05-10
#> tibble 1.4.2 2018-01-22
#> tidyselect 0.2.4 2018-02-26
#> tools 3.4.4 2018-03-15
#> utils * 3.4.4 2018-03-15
#> withr 2.1.2 2018-03-15
#> yaml 2.1.18 2018-03-08
#> source
#> CRAN (R 3.4.0)
#> CRAN (R 3.4.3)
#> local
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> local
#> CRAN (R 3.4.4)
#> local
#> CRAN (R 3.4.3)
#> CRAN (R 3.4.3)
#> cran (@0.7.5)
#> CRAN (R 3.4.1)
#> Github (tidyverse/glue@a2c0f8b)
#> local
#> local
#> CRAN (R 3.4.0)
#> CRAN (R 3.4.0)
#> CRAN (R 3.4.3)
#> CRAN (R 3.4.0)
#> CRAN (R 3.4.0)
#> local
#> Github (pachamaltese/oec-r-package@00a367b)
#> CRAN (R 3.4.3)
#> CRAN (R 3.4.0)
#> cran (@0.2.5)
#> CRAN (R 3.4.0)
#> cran (@0.12.17)
#> cran (@0.2.1)
#> CRAN (R 3.4.3)
#> CRAN (R 3.4.3)
#> local
#> cran (@1.2.2)
#> cran (@1.3.1)
#> CRAN (R 3.4.3)
#> CRAN (R 3.4.3)
#> local
#> local
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
@aedobbyn Hi. Nice to meet u too! Thanks for letting me know. I installed R on another machine to try to reproduce this and ... it looks as the OEC has a problem
I should explore this, the problem can me an MIT server problem at the moment, but this is problem as the package just verifies internet connection before trying to get data and does not inspect additional things.
For instance, if I visit https://atlas.media.mit.edu/sitc/export/2010/chl/arg/show/ I get this
@aedobbyn update: somebody within the OEC team screwed something and not all origins/destinations are available at the moment... the US data is working ok but no cigar trying to compute RCAs that need world flows :(
I shall update when this is fixed
Great, thanks for the quick update! Good opportunity to provide informative error messages to the user when the server is down or, in the case of the URL above, nothing is returned at all (I don't even get a 500 in the Chrome console). I'll look out for another update from you!
To test (in unit tests) how the package react to http errors you can use @sckott's webmockr
package https://github.com/ropensci/webmockr
@maelle: Thanks a again !! at this point you have contributed more to this package than the actual listed ctbs @aedobbyn: Thanks, now it's fixed. This was a server side problem that I should include as an error message. I tested both R and browser access to the API and it's working again :)
Hmm I still get the same parse error: premature EOF
even though that endpoint in the browser now displays data. @cimentadaj are you able to receive data?
Hi @maelle, @aedobbyn and @pachamaltese ! Glad to review this package. I'll submit my review this weekend, but in the mean time, the development version of the package (devtools::install_github("pachamaltese/oec-r-package")
) seems to work for me:
library(oec)
getdata("chl", "chn", 2015)
#> Valid input. Processing 2015 data...
#> # A tibble: 670 x 31
#> year origin_id destination_id origin_name destination_name id
#> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 2015 chl chn Chile China 5722
#> 2 2015 chl chn Chile China 5838
#> 3 2015 chl chn Chile China 6251
#> 4 2015 chl chn Chile China 6282
#> 5 2015 chl chn Chile China 6289
#> 6 2015 chl chn Chile China 6352
#> 7 2015 chl chn Chile China 6418
#> 8 2015 chl chn Chile China 6572
#> 9 2015 chl chn Chile China 6577
#> 10 2015 chl chn Chile China 6631
#> # ... with 660 more rows, and 25 more variables: id_len <int>,
#> # product_name <chr>, group_id <chr>, group_name <chr>,
#> # export_val <dbl>, export_val_growth_pct <dbl>,
#> # export_val_growth_pct_5 <dbl>, export_val_growth_val <dbl>,
#> # export_val_growth_val_5 <dbl>, export_rca <dbl>, import_val <dbl>,
#> # import_val_growth_pct <dbl>, import_val_growth_pct_5 <dbl>,
#> # import_val_growth_val <dbl>, import_val_growth_val_5 <dbl>,
#> # import_rca <dbl>, trade_exchange_val <dbl>, pci <dbl>, pci_rank <dbl>,
#> # pci_rank_delta <dbl>, top_exporter_code <chr>,
#> # top_importer_code <chr>, top_importer <chr>, top_exporter <chr>,
#> # color <chr>
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.4 (2018-03-15)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> tz Europe/Paris
#> date 2018-06-20
#> Packages -----------------------------------------------------------------
#> package * version date
#> assertthat 0.2.0 2017-04-11
#> backports 1.1.2 2017-12-13
#> base * 3.4.4 2018-03-15
#> bindr 0.1.1 2018-03-13
#> bindrcpp * 0.2.2 2018-03-29
#> cli 1.0.0 2017-11-05
#> compiler 3.4.4 2018-03-15
#> crayon 1.3.4 2017-09-16
#> curl 3.2 2018-03-28
#> datasets * 3.4.4 2018-03-15
#> devtools 1.13.5 2018-02-18
#> digest 0.6.15 2018-01-28
#> dplyr 0.7.5 2018-05-19
#> evaluate 0.10.1 2017-06-24
#> formatR 1.5 2017-04-25
#> glue 1.2.0.9000 2018-06-19
#> graphics * 3.4.4 2018-03-15
#> grDevices * 3.4.4 2018-03-15
#> htmltools 0.3.6 2017-04-28
#> jsonlite 1.5 2017-06-01
#> knitr 1.20 2018-02-20
#> magrittr 1.5 2014-11-22
#> memoise 1.1.0 2018-03-20
#> methods * 3.4.4 2018-03-15
#> oec * 2.8.1 2018-06-20
#> pillar 1.2.2 2018-04-26
#> pkgconfig 2.0.1 2017-03-21
#> purrr 0.2.5 2018-05-29
#> R6 2.2.2 2017-06-17
#> Rcpp 0.12.17 2018-05-18
#> rlang 0.2.0.9001 2018-06-20
#> rmarkdown 1.9 2018-03-01
#> rprojroot 1.3-2 2018-01-03
#> stats * 3.4.4 2018-03-15
#> stringi 1.1.7 2018-03-12
#> stringr 1.3.1 2018-05-10
#> tibble 1.4.2 2018-01-22
#> tidyselect 0.2.4 2018-02-26
#> tools 3.4.4 2018-03-15
#> utf8 1.1.3 2018-01-03
#> utils * 3.4.4 2018-03-15
#> withr 2.1.2 2018-03-18
#> yaml 2.1.18 2018-03-08
#> source
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.3)
#> local
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> local
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> local
#> CRAN (R 3.4.3)
#> CRAN (R 3.4.3)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> Github (tidyverse/glue@a2c0f8b)
#> local
#> local
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> Github (hadley/memoise@06d16ec)
#> local
#> Github (pachamaltese/oec-r-package@00a367b)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> Github (r-lib/rlang@ba4fb06)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> local
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> CRAN (R 3.4.4)
#> local
#> CRAN (R 3.4.4)
#> local
#> Github (r-lib/withr@79d7b0d)
#> CRAN (R 3.4.4)
Hi @maelle, @aedobbyn and @cimentadaj
I added some error messages to inform the user when the wifi drops or MIT server is down/with problems. Last night I had the same problem (server side) and the browser returned an empty json, then 30 mins later I could obtain the data.
I'll tell Media Lab people about this, but in the meanwhile an error message "patches" server side issues :)
And yes, devtools::install_github("pachamaltese/oec")
install the current version of the package that you are reviewing.
Great, working for me too now!
Just so I know the intent @pachamaltese, you quote variables from the OEC and use sym
and !!
to unquote them to avoid the NOTE no visible binding for global variable {variable}
correct?
A version with the tidyeval stripped out passes all tests without warnings so I imagine the NOTE is the only reason.
@aedobbyn exactly!! nobody ever before has discovered the main reason why I do that :D
Totally understand, those NOTEs are annoying :relaxed:
As you probably know, those NOTEs shouldn't stop you from passing a CRAN check, especially if you document them in an . Rbuildignored cran-comments.md
. Might be cleaner to do that and avoid tidyeval altogether, but I would understand if you want to leave the tidyeval in.
@maelle makes the excellent point that you can let document these global variables in a globalVariables.R
file like this to avoid the no visible binding for global variable
NOTEs. (Also see, e.g., an example of it in use in her opencage
package.)
If you devtools::check()
the oec
code in that amanda
branch on my fork you won't get any of those NOTEs and you won't need to use the sym
+ !!
syntax -- let me know if you want me to shoot you a PR with the code from that fork :)
@aedobbyn thanks! I accept that if and only if you agree to be added as ctb
because that is a contribution, a good contribution to be precise :)
Haha that's very generous! Its' @maelle who deserves the credit, as usual 😆
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
The oec
package is well written and definitely fulfills the objective of downloading trade date easily into R. I have comments which serve to improve the code, the user experience and the readibility of the code. Some are more immediate then others so feel free to ignore the small ones or new features if time is of essence. However, I would consider them for future releases. They are not in any particular order of importance.
Great opportunity to use a switch
statement here.
I would use match.args
here
I'm not sure whether curl
retries calling the API if it doesn't respond, but if it doesn't, then you could retry 3-5 times like here. This would be very handy because you call the API several times in other links. Considering that the MIT server has been a bit unstable in the last few days then this could come in handy. I think there's a few packages to do this properly but if you want to avoid adding more dependencies, a function hack is this one using httr
:
get_resp <- function(url, attempts_left = 5) {
stopifnot(attempts_left > 0)
resp <- httr::GET(url)
# On a successful GET, return the response
if (httr::status_code(resp) == 200) {
resp
} else if (attempts_left == 1) { # When attempts run out, stop with an error
stop("Cannot connect to the MIT API")
} else { # Otherwise, sleep a second and try again
Sys.sleep(1)
get_resp(url, attempts_left - 1)
}
}
Feel free to adapt it to using fromJSON
and remove the httr
code.
Maybe I'm a bit ignorant but in this specific case, since you already checked that the user has internet at the beginning of the function, shouldn't this be only an MIT problem? Perhaps this error check can be wrapped into a function because it's used 2-3 times.
I see that you create several functions called read_from_api_**
to create the URL of the API. It would be cleaner and safer to create a single function that is modular to all API urls and you provide different arguments depending on the specific url.
Any particular reason why you have get_data
and getdata
? Seems a bit redundant and not a clear grammatical reason to have two (for example, color and colour in ggplot2
). In any case, because they share exactly the same documentation I would suggest documenting them together into a single page using #' @rdname get_data
. See http://r-pkgs.had.co.nz/man.html. In fact, because they're the same exact function, if you want to keep both, then I would do something like this:
get_data <- function(all the arguments) {
all code here....
}
getdata <- function(same arguments) {
get_data(pass the arguments)
}
To avoid repetition.
This is clearly more work, but I think it might be worth it for new users. Instead of relying on your user to search for a country in the 264 country names to look for a code, you could create a function that regexps the name of the country and returns the country code. For example, get_countrycode("Chile")
or get_countrycode("chile")
should yield the same. This can be piped into get_data
more intuitively like get_data(get_countrycode("Chile"), get_countrycode("China"), 2010, 2015)
. Alternatively, the user can search the country
tibble
to search manually. But note that non-frequent R users will probably search manually and not with regexps.
On the same line, you could do something similar for all product tibbles
. I would take a similar approach to the way the World Bank R package does it, wbstats
. For example
library(wbstats)
unemploy_vars <- wbsearch(pattern = "unemployment")
head(unemploy_vars)
Check https://cran.r-project.org/web/packages/wbstats/README.html for more examples.
You could have a generic get_product
that searches in all product tibbles
and returns an aggregated tibble
with all matches. For example if we search for Antiques
then we get the same result from oec::hs02
but also a new column with indicating that it comes from hs02
. This way when you have similar matches then the user can disambiguate where it comes from.
Of course this is a lot of work, but here's a very quick and crude approach:
get_products <- function(name) {
# Grab all the names of all hs datasets
all_datastr <-
stringr::str_subset(
data(package = "oec")$results[, "Item"],
"^hs"
)
# get the datasets, create the type_product column, bind them all together
# and do the search
all_datastr %>%
purrr::map(get) %>%
purrr::map2_dfr(all_datastr, ~ {.x$type_product <- .y; .x}) %>%
dplyr::filter(stringr::str_detect(product_name, name))
}
get_products("Gold")
get_products("Animals")
Misc:
In the oec
website and Github repo the installation code chunk still has the old repo (devtools::install_github("pachamaltese/oec-r-package")
. In fact, try doing an overall repo search and update all URL's. The DESCRIPTION file needs to be updated in the BugReports field as well.
I like that you're very defensive in what you accept in the function but try to help the user in pointing out where something went wront. For example, here I will paste what the user wrote and when appropriate (not here because there's too many countries), list the available options the user could've chosen.
Great package!
Whoa. Thanks a lot. This helps to improve the package. "Get_data" was provided by the other reviewers as an alternative without tidy evaluation. I'm from cellphone now so I'm eager to implement changes later!
El sáb., jun. 23, 2018 5:59 PM, Jorge Cimentada notifications@github.com escribió:
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [x ] A statement of need clearly stating problems the software is designed to solve and its target audience in README
- [x ] Installation instructions: for the development version of package and any non-standard dependencies in README
- Vignette(s) demonstrating major functionality that runs successfully locally
- Function Documentation: for all exported functions in R help
- Examples for all exported functions in R Help that run successfully locally
- Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS
- The package has an obvious research application according to JOSS's definition http://joss.theoj.org/about#submission_requirements
The package contains a paper.md matching JOSS's requirements http://joss.theoj.org/about#paper_structure with:
- A short summary describing the high-level functionality of the software
- Authors: A list of authors with their affiliations
- A statement of need clearly stating problems the software is designed to solve and its target audience.
- References: with DOIs for all those that have one (e.g. papers, datasets, software).
Functionality
- Installation: Installation succeeds as documented.
- Functionality: Any functional claims of the software been confirmed.
- Performance: Any performance claims of the software been confirmed.
- Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Final approval (post-review)
- The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 1.5 Review Comments
The oec package is well written and definitely fulfills the objective of downloading trade date easily into R. I have comments which serve to improve the code, the user experience and the readibility of the code. Some are more immediate then others so feel free to ignore the small ones or new features if time is of essence. However, I would consider them for future releases. They are not in any particular order of importance.
-
Great opportunity to use a switch statement here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L112-L130 .
I would use match.args here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L62-L64
I'm not sure whether curl retries calling the API if it doesn't respond, but if it doesn't, then you could retry 3-5 times like here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L142-L144. This would be very handy because you call the API several times in other links. Considering that the MIT server has been a bit unstable in the last few days then this could come in handy. I think there's a few packages to do this properly but if you want to avoid adding more dependencies, a function hack is this one using httr:
get_resp <- function(url, attempts_left = 5) {
stopifnot(attempts_left > 0)
resp <- httr::GET(url)
On a successful GET, return the response
if (httr::status_code(resp) == 200) { resp } else if (attempts_left == 1) { # When attempts run out, stop with an error stop("Cannot connect to the MIT API") } else { # Otherwise, sleep a second and try again Sys.sleep(1) get_resp(url, attempts_left - 1) } }
Feel free to adapt it to using fromJSON and remove the httr code.
-
Maybe I'm a bit ignorant but in this specific case https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L146-L149, since you already checked that the user has internet at the beginning of the function, shouldn't this be only an MIT problem? Perhaps this error check can be wrapped into a function because it's used 2-3 times.
I see that you create several functions called read_fromapi** to create the URL of the API. It would be cleaner and safer to create a single function that is modular to all API urls and you provide different arguments depending on the specific url.
Any particular reason why you have get_data and getdata? Seems a bit redundant and not a clear grammatical reason to have two (for example, color and colour in ggplot2). In any case, because they share exactly the same documentation I would suggest documenting them together into a single page using #' @rdname get_data. See http://r-pkgs.had.co.nz/man.html. In fact, because they're the same exact function, if you want to keep both, then I would do something like this:
get_data <- function(all the arguments) { all code here.... } getdata <- function(same arguments) { get_data(pass the arguments) }
To avoid repetition.
-
This is clearly more work, but I think it might be worth it for new users. Instead of relying on your user to search for a country in the 264 country names to look for a code, you could create a function that regexps the name of the country and returns the country code. For example, get_countrycode("Chile") or get_countrycode("chile") should yield the same. This can be piped into get_data more intuitively like get_data(get_countrycode("Chile"), get_countrycode("China"), 2010, 2015). Alternatively, the user can search the country tibble to search manually. But note that non-frequent R users will probably search manually and not with regexps.
On the same line, you could do something similar for all product tibbles. I would take a similar approach to the way the World Bank R package does it, wbstats. For example
library(wbstats) unemploy_vars <- wbsearch(pattern = "unemployment") head(unemploy_vars)
Check https://cran.r-project.org/web/packages/wbstats/README.html for more examples.
You could have a generic get_product that searches in all product tibbles and returns an aggregated tibble with all matches. For example if we search for Antiques then we get the same result from oec::hs02 but also a new column with indicating that it comes from hs02. This way when you have similar matches then the user can disambiguate where it comes from.
Of course this is a lot of work, but here's a very quick and crude approach:
get_products <- function(name) {
Grab all the names of all hs datasets
all_datastr <- stringr::str_subset( data(package = "oec")$results[, "Item"], "^hs" )
get the datasets, create the type_product column, bind them all together
and do the search
all_datastr %>% purrr::map(get) %>% purrr::map2_dfr(all_datastr, ~ {.x$type_product <- .y; .x}) %>% dplyr::filter(stringr::str_detect(product_name, name)) }
get_products("Gold") get_products("Animals")
Misc:
In the oec website and Github repo the installation code chunk still has the old repo (devtools::install_github("pachamaltese/oec-r-package"). In fact, try doing an overall repo search and update all URL's. The DESCRIPTION file needs to be updated in the BugReports field as well.
I like that you're very defensive in what you accept in the function but try to help the user in pointing out where something went wront. For example, here https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L47-L51 I will paste what the user wrote and when appropriate (not here because there's too many countries), list the available options the user could've chosen.
Great package!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-399710318, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OedP8zut_pby9oJ6ToxzUKfxXoTEks5t_qwigaJpZM4T-I-i .
@cimentadaj thanks a lot!! I have limited laptop access at the moment but I saw this and there are really good suggestions. Can I add you as a contributor? I never thought about matching arguments.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing:
2
First off, I'm really excited for this package and the data it's going to make available to people in tidy format.
I think the modifications you've put in place so far and the ones suggested by @cimentadaj are all very useful. (Many of those I was going to suggest myself, but I won't repeat Jorge!) +1 especially to trying the server multiple times (you might even implement exponential backoff) and making things a bit more concise with match.args
and switch
.
A few other notes:
An alternative to @cimentadaj's suggestion to allow users to get_countrycode
using a regex match would be to allow users to supply either the country_code
or the country
in get_data
; so get_data("chl", "chn", 2015, "sitc")
would return the same data as get_data("Chile", "China", 2015, "sitc")
. While this approach gives the user less leeway in the input they can supply than a regex would, I think most of the countries in country_codes$country
are represented the way most people would expect, which saves some work on your end.
It looks like you allow the user to supply "all" as origin and destination, but this isn't reflected in the documentation. Beyond simply mentioning that this is an option, it may even be worth setting "all"
as the default value for origin
and destination
so that the user can get up and running with the function without looking through country codes.
I'll note that supplying "all"
for both origin
and destination
does fail at the moment because during the join trade flows step you end up with top_importer.x
, top_importer.y
, etc. so operations expecting a single top_importer
error out. So this case will need a bit of special treatment.
library(oec)
getdata("all", "all", 2015)
#> Valid input. Processing 2015 data...
#> Error in .f(.x[[i]], ...): object 'top_importer' not found
The docs say that classification
defaults to "sitc"
, but that doesn't seem to be the case at the moment. I'm all in favor of more defaults for the function, so users can get up and running more quickly!
There are some quick wins to increase test coverage that would make the package more stable in the long term if the API data availability changes. I'd add tests for all the different classification types; that way if something happens to one of them down the road, your test will fail and Travis will alert you. Currently there is only a test for classification = "sitc"
.
I'd also suggest using webmockr
to in the tests as well, as @maelle mentioned
The styling generally looks good but I'd recommend a [styler](https://github.com/r-lib/styler)::style_pkg()
just to catch the remaining small things.
I would consider changing oec_r_package.Rproj
to oec.Rproj
for consistency (and I second @cimentadaj's point that it's good to do an entire sweep of the package to make sure that there are no more instances of oec_r_package
or oec-r-package
left).
I believe only the .Rmd needs to go in the vignettes
directory. I like that you show in a vignette how you went about tidying the data that the user finds in the package.
I would add one or two examples of get_data
to the readme or the pkgdown site so that people don't have to find the vignette or go to the docs to get a preview of the package. (Makes it more likely for people to download the package instead of clicking away, imo).
You may already be in the process of doing the full switchover from getdata
to get_data
but just to make sure -- anywhere where getdata
was previously used e.g. in the vignettes, examples etc. should be replaced with get_data
and of course the getdata
function and its test file can be deleted.
You might consider documenting what each of these fields returned from the API mean. If there isn't a good link to point people to in the @return
, you could include a tibble in the package that the user can access with something like variable_definitions
.
I think that's it from me! Looking forward to seeing this package get some good use!
Whoa! Many thanks. I'm still from cellphone but THIS "it may even be worth setting "all" as the default value for origin and destination" is where I'm starting. Thanks a lot!
On Sun, Jun 24, 2018, 8:33 PM Amanda Dobbyn notifications@github.com wrote:
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- A statement of need clearly stating problems the software is designed to solve and its target audience in README
- Installation instructions: for the development version of package and any non-standard dependencies in README
- Vignette(s) demonstrating major functionality that runs successfully locally
- Function Documentation: for all exported functions in R help
- Examples for all exported functions in R Help that run successfully locally
- Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS
- The package has an obvious research application according to JOSS's definition http://joss.theoj.org/about#submission_requirements
The package contains a paper.md matching JOSS's requirements http://joss.theoj.org/about#paper_structure with:
- A short summary describing the high-level functionality of the software
- Authors: A list of authors with their affiliations
- A statement of need clearly stating problems the software is designed to solve and its target audience.
- References: with DOIs for all those that have one (e.g. papers, datasets, software).
Functionality
- Installation: Installation succeeds as documented.
- Functionality: Any functional claims of the software been confirmed.
- Performance: Any performance claims of the software been confirmed.
- Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Final approval (post-review)
- The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing:
2 Review Comments
First off, I'm really excited for this package and the data it's going to make available to people in tidy format.
I think the modifications you've put in place so far and the ones suggested by @cimentadaj https://github.com/cimentadaj are all very useful. (Many of those I was going to suggest myself, but I won't repeat Jorge!) +1 especially to trying the server multiple times (you might even implement exponential backoff) and making things a bit more concise with match.args and switch.
A few other notes:
-
An alternative to @cimentadaj https://github.com/cimentadaj's suggestion to allow users to get_countrycode using a regex match would be to allow users to supply either the country_code or the country in get_data; so get_data("chl", "chn", 2015, "sitc") would return the same data as get_data("Chile", "China", 2015, "sitc"). While this approach gives the user less leeway in the input they can supply than a regex would, I think most of the countries in country_codes$country are represented the way most people would expect, which saves some work on your end.
It looks like you allow the user to supply "all" as origin and destination https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L179, but this isn't reflected in the documentation. Beyond simply mentioning that this is an option, it may even be worth setting "all" as the default value for origin and destination so that the user can get up and running with the function without looking through country codes.
I'll note that supplying "all" for both origin and destination does fail at the moment because during the join trade flows step https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L278 you end up with top_importer.x, top_importer.y, etc. so operations expecting a single top_importer error out. So this case will need a bit of special treatment.
library(oec) getdata("all", "all", 2015)
> Valid input. Processing 2015 data...
> Error in .f(.x[[i]], ...): object 'top_importer' not found
-
The docs say https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L10 that classification defaults to "sitc", but that doesn't seem to be the case https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L41 at the moment. I'm all in favor of more defaults for the function, so users can get up and running more quickly!
There are some quick wins to increase test coverage https://codecov.io/gh/pachamaltese/oec/src/6da195837d28fb9638f89e466fd9a33816dca6f8/R/get_data.R that would make the package more stable in the long term if the API data availability changes. I'd add tests for all the different classification types; that way if something happens to one of them down the road, your test will fail and Travis will alert you. Currently there is only a test for classification = "sitc".
I'd also suggest using webmockr to in the tests as well, as @maelle mentioned https://github.com/ropensci/onboarding/issues/217#issuecomment-398150814
The styling generally looks good but I'd recommend a styler::style_pkg() just to catch the remaining small things.
I would consider changing oec_r_package.Rproj to oec.Rproj for consistency (and I second @cimentadaj https://github.com/cimentadaj's point that it's good to do an entire sweep of the package to make sure that there are no more instances of oec_r_package or oec-r-package left).
I believe only the .Rmd needs to go in the vignettes directory. I like that you show in a vignette how you went about tidying the data that the user finds in the package.
I would add one or two examples of get_data to the readme or the pkgdown site so that people don't have to find the vignette or go to the docs to get a preview of the package. (Makes it more likely for people to download the package instead of clicking away, imo).
You may already be in the process of doing the full switchover from getdata to get_data but just to make sure -- anywhere where getdata was previously used e.g. in the vignettes, examples etc. should be replaced with get_data and of course the getdata function and its test file can be deleted.
You might consider documenting what each of these fields returned from the API mean. If there isn't a good link to point people to in the @return, you could include a tibble in the package that the user can access with something like variable_definitions.
I think that's it from me! Looking forward to seeing this package get some good use!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-399801749, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OQ64kkf5lWbay5-TR1i1FX2nIdD0ks5uAC_fgaJpZM4T-I-i .
Awesome, looking forward to seeing the v2!
Many thanks for your reviews @aedobbyn and @cimentadaj, they seem very thorough and useful! 😺
I learnt about the global variables "trick" myself here 😉
+1 for @aedobbyn and @maelle comments! Good job @pachamaltese, looking forward to the next version! Yes, of course, feel free to include me as ctb
.
By the way you can add reviewers as ctb
but also as rev
cf https://ropensci.github.io/dev_guide/building.html#authorship
thanks a lot @maelle, may I submit a new version next week?
Of course! As per our guide for authors "we ask that you respond to reviewers’ comments within 2 weeks of the last-submitted review, but you may make updates to your package or respond at any time.". 😺
thanks!!! this week I'm useless, just reproducible reports at work, but next sat I'll retake this :) have a great week
Summary
oec provides an easy way to obtain data from the Observatory of Economic Complexity by accessing its API.
https://github.com/pachamaltese/oec-r-package
databases, because the package connects to an API and does 3 or more API calls to simplify things for the final user who wants imports/exports and some metrics such as % of increase/decrease.
Non-expert users that use international trade data. This can also be targeted to intermediate/advanced users who can benefit from the speed and short syntax that this package provides.
Not at the moment (in 2 yrs this is the only one)
No.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
None.
No.
I don't really know.