Closed pachadotdev closed 5 years ago
@maelle can I have more time to implement all the suggestions? I'm working on them and the coolness shown on the feedback requires time to implement :)
Thanks for the update @pachamaltese, yes that's fine, please keep informing us and ask any question you might have. 🙂
@maelle @aedobbyn @cimentadaj
Thanks a lot for the feedback. Here's a list of todos after the feedback, I shall be working on this after the commit I just made as a few parts require dedication. Now I finally have time to organize this and produce a good result. Have a good weekend!!
[x] Re-try API calls here by adapting this function:
get_resp <- function(url, attempts_left = 5) {
stopifnot(attempts_left > 0)
resp <- httr::GET(url)
# On a successful GET, return the response
if (httr::status_code(resp) == 200) {
resp
} else if (attempts_left == 1) { # When attempts run out, stop with an error
stop("Cannot connect to the MIT API")
} else { # Otherwise, sleep a second and try again
Sys.sleep(1)
get_resp(url, attempts_left - 1)
}
}
read_from_api
functionget_data
suggestions in getdata
get_countrycode("Chile")
or get_countrycode("chile")
should yield the same. This can be piped into get_data
more intuitively like get_data(get_countrycode("Chile")
library(wbstats)
unemploy_vars <- wbsearch(pattern = "unemployment")
head(unemploy_vars)
[x] Create generic get_product
that searches in all product tibbles and returns an aggregated tibble with all matches. For example if we search for Antiques then we get the same result from oec::hs02
but also a new column with indicating that it comes from hs02
. Very quick and crude approach:
get_products <- function(name) {
# Grab all the names of all hs datasets
all_datastr <-
stringr::str_subset(
data(package = "oec")$results[, "Item"],
"^hs"
)
# get the datasets, create the type_product column, bind them all together
# and do the search
all_datastr %>%
purrr::map(get) %>%
purrr::map2_dfr(all_datastr, ~ {.x$type_product <- .y; .x}) %>%
dplyr::filter(stringr::str_detect(product_name, name))
}
get_products("Gold") get_products("Animals")
- [x] An alternative to @cimentadaj's suggestion to allow users to get_countrycode using a regex match would be to allow users to supply either the `country_code` or the country in `get_data`; so `get_data("chl", "chn", 2015, "sitc")` would return the same data as `get_data("Chile", "China", 2015, "sitc")`. While this approach gives the user less leeway in the input they can supply than a regex would, I think most of the countries in country_codes$country are represented the way most people would expect, which saves some work on your end.
- [x] Include "all" as origin and destination in documentation and as default value for origin and destination so that the user can get up and running with the function without looking through country codes.
- [x] Fix join trade flows step to deal with `top_importer.x, top_importer.y` when using "all" as argument
- [x] Move `sitc` default to arguments instead of "if missing"
- [x] Use `webmockr` in tests
- [x] Use Styler (https://github.com/r-lib/styler) instead of Lint
- [x] Rename `oec_r_package.Rproj` to `oec.Rproj`
- [x] Check for non-Rmd files in `vignettes/`
- [x] Add one or two examples of `get_data` to the readme or the pkgdown site so that people don't have to find the vignette or go to the docs to get a preview of the package.
- [x] Switch `getdata` to `get_data`
- [x] Document results from API and explain a bit about RCA, monetary units, etc.
@maelle I' m almost ready!! I implemented all of the suggestions except webmocker and styler, I'm doing that tonight. Thanks a lot @aedobbyn @cimentadaj, thanks to your comments the package improved a ton and now you can jsut write `get_data("CHILE", "ArGeNtiNa") and the internals will make it work :)
@maelle now I only have webmockr implementation in unit tests and explaining a bit about result from API :D
@pachamaltese thanks for the update! Note that @sckott's webmockr
doesn't support httr
yet, only crul
, so you can either switch to crul
, or keep using httr
and use https://github.com/nealrichardson/httptest
thanks @maelle - httr integration is operational in a branch on github. install like remotes::install_github("ropensci/vcr@httr-integration")
and use it just like the crul integration. Would love any feedback. It should go to CRAN soon as we don't require any changes in httr itself
sorry, meant to ping you in my above comment @pachamaltese ^^
Thanks a ton for the clarification @sckott, and good news! 😃
thanks @maelle @sckott ! @maelle now I completed all the changes except for the webmockr part which I am not sure how to implement
Ok thanks for the update @pachamaltese!
@sckott is there any guide, or any example using webmockr
with httr
in the wild (a toy example?)? I only know of https://ropensci.github.io/http-testing-book/
👋 @pachamaltese So I know exactly what you want to do, do you want to do mocking with webmockr
where you simply match requests, but don't cache the responses? Or do you want more what vcr
does where it matches requests and caches responses? vcr
is a good fit for using in your test suite, whereas webmockr
can be used for tests, but doesn't have to be, and you do have to cache your own responses with webmockr
if you want them cached. Let me know then I can give examples
Hi @sckott. I think it would be better to use vcr
, match request and cache responses, specially when sometimes the server acts weird. Thanks a lot!!
Okay. httr
integration is not yet on CRAN. You need github versions of vcr
and webmockr
. Install like remotes::install_github("ropensci/vcr@httr-integration")
The usage is no different from if you were using crul
. So any examples, etc. in the docs should suffice. I just made a tiny package to demo using httr
and vcr
together https://github.com/sckott/catfact so you can see what the setup is like.
more in depth docs for vcr/webmockr here https://ropensci.github.io/http-testing-book/
@sckott thanks a lot !!! I shall be reading that soon
@maelle finally I implemented webmockr and I also used vcr and crul, so the functions changed a bit :)
@pachamaltese cool, congrats! So you're done with all comments?
Yes! Thanks a lot for the useful comments. I just kept tidyeval for consistency.
Best,
On Wed, Aug 22, 2018, 1:32 AM Maëlle Salmon notifications@github.com wrote:
@pachamaltese https://github.com/pachamaltese cool, congrats! So you're done with all comments?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-414907770, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OcfeL3sLKCZT8ktQV3KnFbNrQ5tRks5uTN7ygaJpZM4T-I-i .
@aedobbyn @cimentadaj can you please have a last look at the package? Thanks!
Will do this weekend!
Looking good @pachamaltese!
Here are a few notes. I hope to be able to to look through more thoroughly this weekend but in case I'm not able to I wanted to give you something. By topic, in no particular order:
get_data
In get_data
, the warnings No encoding supplied: defaulting to UTF-8.
occur during the parse()
call
parse(encoding = "UTF-8")
or suppress the messageIn read_from_api
you have an attempts_left
argument but there's no way for the user to change this. Not required but you could think about including this as a user-defined variable in get_data
This is super nitpicky but the processing data message is a little misleading because it seems like a progress bar when really it messages all the years upfront. That means that if there were a problem with a certain year these messages wouldn't help you pin down which year failed. Maybe something like sprintf("\nProcessing %s to %s data...", range(years)[1], range(years)[2])
would be better. Otherwise, you could include the message of which year we're on in the read_from_api
function.
get_countrycode
The error message for country code that doesn't exist could be moe informative than length(countrycode) != 0 is not TRUE
You might want to include a mapping from "us" to "united states", "uk" to "united kingdom", and other similar situations since I can see people trying get_countrycode("us")
and being surprised when they don't get a US result
Other
In the devtools install line, it should be devtools::install_github("pachamaltese/oec")
You shouldn't need to prefix package functions with oec::
(e.g. https://github.com/pachamaltese/oec/tree/64b185675e9faf1318f6cae10d16b75f97607423/R/get_data.R#L87), I think
I'm not sure what the correct protocol for the fixtures
dir . Looks like you use it in tests/testthat/helper-oec.R
to get vcr
configurations. The presence of the extra directory does give you a NOTE on devtools::check()
, so I think it should at least be buildignored
Should update the vignette titles (line 7 for both). On devtools::release_checks()
: WARNING: placeholder 'Vignette Title' detected in 'title' field and/or 'VignetteIndexEntry' for: oec-data.Rmd,oec.Rmd
May want to update the year on your LICENSE and in the README to 2018
What was the rationale for keeping the tidyeval instead of removing it and using a globals.R
file?
This is looking really solid overall -- will be excited to use it very soon!
WHOA !! thanks a lot @aedobbyn !!!!!!!!!
I'm making changes this weekend. For now, the rationale for keeping the tidyeval was just to mimic some design decisions I like, for example those in the highcharter
package by @jbkunst.
No problem, glad it's helpful! And makes sense -- it doesn't affect the functionality of course and assuming rlang stays backward-compatible (which I think they will?) you should be fine if they change the tidyeval syntax.
get_data
get_countrycode
Other
Thanks for implementing those changes so quickly! Last bit of suggestions from me, I think:
In your get_data
@return
documentation I would point people to your handy reference about what each of the columns means since a lot of these are non-obvious.
If you feel it's worth it (though I don't think strictly necessary), you could refactor out some of the redundancies in the get_data
function. For instance, you use the same rename
+ mutate
flow in each extraction, so this could be pulled out into a utility function. Similarly, the checks at the beginning of the function making sure that the years specified are in bounds for the classification
could be stored in a lookup table. This way you only need one of these in get_data
instead of five.
Right now you only allow years up to 2016, but in the coming years will the OEC have data past that year? Instead of updating this by hand is there a way to check what the greatest year available is, maybe on package load and store that as a variable?
I think in all cases you can replace single &
s with &&
s and |
s with ||
s
Small typo here ("lower of equal to")
Unnecessary line, I think
Future improvement for get_productcode
would be a word_boundary
boolean param for an exact word match. That would take care of the first result for get_productcode("wine")
being "Live swine" 😆
Very small detail -- you might want to put the top_exporter_code
and top_exporter
cols next to each other in the output since they reference the same thing. Right now they're sometimes separated by a few columns (e.g. trade_exchange_val
). Same for top_importer_code
and top_importer
Should re-run pkgdown once new changes are implemented
thanks a lot @aedobbyn !! I still have some pendings that I am adding to this new list btw, after all of this cool changes you suggested, can I contact you by email? it's for a related project that benefited a lot from this feedback mine is mvargas &a t& dcc uchile cl
Great job @pachamaltese. I think @aedobbyn has covered most of the major problems. Here are some minor concerns.
https://github.com/pachamaltese/oec/blob/master/R/get_countrycode.R#L51-L52 should say 'vector' rather than package.
Perhaps you can move this outside the main function so it's cleaner.
In general, I would create tests that check that the content of the request resembles what you requested and not only the format of the tibble. I would add a few tests checking whether get_data
returned the requested country as well as for the different years. Maybe this applies to some topics as well. I just had a nasty bug in one of my packages related to this and it was awful!
Here I would check whether it has a 'try-error' to be safe. This makes sense because you're using try anyways.
Hope these are useful!
@cimentadaj once again thanks for the feedback !! I completed all the points that Amanda listed (above I wrote a clickable list that I completed)
the server is down again, please give 1-2 days to complete Jorge's addresses?
@cimentadaj: any chance to contact you by email? mine is " m vargas a/t dcc uchile cl"
Yes @pachamaltese , feel free to contact me and no worries about the 1-2 days.
@aedobbyn @cimentadaj thanks a lot!! I had hectic days at the office, here are some changes that I'll try to complete ASAP
@aedobbyn @cimentadaj Hi. Thanks for the feedback! there are four point I can't solve without breaking the check :S This weekend I' ll try to solve https://github.com/pachamaltese/oec/blob/master/R/get_data.R#L112-L161 as it is totally cleaner with that change that without
:wave: @pachamaltese did you get any chance to make progress on the package? When you do, please summarize your response again (using your checklist again I guess :wink:).
@maelle Hi, I was not able to test a lot because the server has been down a lot actually. I updated the list to reflect the only 3 point I could not fix. Those can be more tricky actually!
@pachamaltese any update?
Hi Maelle The server is just acting weird. Can we put this on hold? Yesterday I spoke to Amanda and Jorge about providing an alternative API that would be hassle-free. Best
On Tue, Oct 16, 2018, 1:25 AM Maëlle Salmon notifications@github.com wrote:
@pachamaltese https://github.com/pachamaltese any update?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-430096474, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OSLbH9Nksd8-NGk4wq6cnnhuxbxjks5ulV_KgaJpZM4T-I-i .
Do you mean you don't trust the data provider enough with stability?
Exactly, at the present time that' the problem. Last year it was different
On Tue, Oct 16, 2018, 11:56 AM Maëlle Salmon notifications@github.com wrote:
Do you mean you don't trust the data provider enough with stability?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/217#issuecomment-430270516, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6Oef7UB08vWE37dgCVnvIWxC0SVTIks5ulfOrgaJpZM4T-I-i .
Ok, too bad, will put the submission on hold, I hope you can solve the problem, good luck!
@pachamaltese :wave:, any news?
Hi Maelle
Sorry about the long silence! I have a tentative PostgreSQL database and now I only have to work on the API. The good part is that both Jorge and Amanda are collaborating on this. If you also want to participate please do not hesitate to give ideas to create a better API.
Best!
—————
Mauricio Vargas Sepúlveda 帕夏 Do you like Data Science? visit pacha.hk 你爱科学数据专吗?你走pacha.hk
El jue., 6 de dic. de 2018 a la(s) 07:00, Maëlle Salmon ( notifications@github.com) escribió:
@pachamaltese https://github.com/pachamaltese 👋, any news?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/software-review/issues/217#issuecomment-444816124, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6OSeG4jl5GGT2v8Rmexs1HF1fz3Llks5u2OqsgaJpZM4T-I-i .
Cool to read! Is there a public repo we can link from here in case someone reads the thread and gets curious?
sure !! I have created this organization to make my last 3 years of work fully reproducible https://github.com/tradestatistics
—————
Mauricio Vargas Sepúlveda 帕夏 Do you like Data Science? visit pacha.hk 你爱科学数据专吗?你走pacha.hk
El jue., 6 de dic. de 2018 a la(s) 10:22, Maëlle Salmon ( notifications@github.com) escribió:
Cool to read! Is there a public repo we can link from here in case someone reads the thread and gets curious?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/software-review/issues/217#issuecomment-444868633, or mute the thread https://github.com/notifications/unsubscribe-auth/AJn6ORf8VRizQyg9eFGZAlW8UEnD0ffUks5u2RoDgaJpZM4T-I-i .
@maelle Hi!! Sorry for the very long silence. After all the issues with the "server is down" and whatever, I ended up creating my own project.
A new package based on oec
is now available here https://github.com/tradestatistics/r-package
But in broader terms, tradestatistics.io is something new I made with a help from the awesome community and everything is pure R (even the API, which is where the new tradestatistics
package takes the data)
There are tons of changes such as the ability to inspect different tables, filter by commodity code length, etc provided I made my own PSQL DB.
Probably I am using a laser beam to open a tuna can, but anyways I've tried do to something to foster reproducibility.
Do you have ideas to rename the package? tradestatistics
sounds ok, but maybe tsapi
is more appropiate.
Least but not last, api.tradestatistics.io/friend is a (boring?) joke I made with the API. It should be easy to complete the joke with the correct query.
:wave: @pachamaltese! Happy New Year! Congrats on the project (and fun endpoint)!
Should we close this issue? And when your new package is ready you can open a pre-submission inquiry (if it does the exact same things at the package using the broken server used to do, it should be in scope, but better asking first).
I find tradestatistics
a better name because tsapi
can be misleading since ts is often an abbreviation for time series.
@maelle ok !! so I'm closing this and opening a new submission I'll prepare the new submission during the next hour
I'll prepare the new submission during the next hour
Maybe best open a pre-submission inquiry and wait until the package is more mature until a full submission (unless the package is mature already).
Summary
oec provides an easy way to obtain data from the Observatory of Economic Complexity by accessing its API.
https://github.com/pachamaltese/oec-r-package
databases, because the package connects to an API and does 3 or more API calls to simplify things for the final user who wants imports/exports and some metrics such as % of increase/decrease.
Non-expert users that use international trade data. This can also be targeted to intermediate/advanced users who can benefit from the speed and short syntax that this package provides.
Not at the moment (in 2 yrs this is the only one)
No.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
None.
No.
I don't really know.