ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

rdhs: submission for Ropensci #238

Closed OJWatson closed 5 years ago

OJWatson commented 6 years ago

Summary

It interacts with the Demographic and Health Survey (DHS) Program API (https://api.dhsprogram.com), and provides tools to use the API to ease identifying, downloading, loading and analysing the raw survey data collected by the DHS.

Package: rdhs
Type: Package
Title: API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Version: 0.5.0
Authors@R: c(
  person("OJ", "Watson", role=c("aut", "cre"),
         email="o.watson15@imperial.ac.uk"),
  person("Jeff", "Eaton", role="aut"))
Maintainer: OJ Watson <o.watson15@imperial.ac.uk>
URL: https://ojwatson.github.io/rdhs/
BugReports: https://github.com/OJWatson/rdhs/issues
Description: Provides a client for (1) querying the DHS API for survey indicators
  and metadata (https://api.dhsprogram.com/#/index.html), (2) identifying surveys
  and datasets for analysis, (3) downloading survey datasets from the DHS website,
  (4) loading datasets and associate metadata into R, and (5) extracting variables
  and combining datasets for pooled analysis.
LazyData: TRUE
Depends: R (>= 3.3.0)
Imports: 
    R6,
    httr,
    jsonlite,
    foreign,
    magrittr,
    rappdirs,
    digest,
    storr,
    xml2,
    qdapRegex,
    rgdal,
    getPass,
    haven,
    iotools
Suggests:
    testthat,
    knitr,
    rmarkdown,
    ggplot2,
    survey,
    data.table,
    microbenchmark
License: MIT + file LICENSE
RoxygenNote: 6.0.1
VignetteBuilder: knitr
Language: en-GB

https://github.com/OJWatson/rdhs

Global Health Researchers and Policy makers. The DHS data has been used in well over 20,000 academic studies (based on google scholar search for "DHS" AND "demographic and health survey") that have helped shape progress towards targets such as the Sustainable Development Goals and inform health policy such as detailing trends in child mortality and characterising the distribution and use of insecticide-treated bed nets in Africa. The package will help assist researchers who use R for these purposes rather than/don't have access to stata/sas (these datasets are the published datasets by the DHS program), as well as serve to simplify commonly required analytical pipelines. The end result aims to increase the end user accessibility to the raw data and create a tool that supports reproducible global health research.

There are a number of other R pacakges that work with DHS data in various ways. A quick search of github for "DHS" and R shows 39 repos, however the majority are small custom scripts.

1 repo looks just at interacting with the DHS API, but it hasn't been added to for almost a year, and the API endpoint functions do not cover all the endpoints available nor allow you to query each endpoint by all the possible query terms. It also requires the user to know query terms rather than having them as arguments.

1 repo also looks at downloading the survey datasets from the website (and it was used initially when designing these fucntions with rdhs). However, it skips over large dataset files, has some bugs depending on the character length of your login credentials, and does not allow you to read in all the datasets available from the website. [ FYI: we don't read in .sas7bdat (we are writing a parser for the oddly formed catalog files provided by the DHS website for these) or hierarchal dataset files as we have a parser for the flat equivalent of hierarchal dataset. In theory each file format should be the same data, so having one parser that works is sufficient, but we have found that the flat and spss data formats have the most complete meta data for the data variable labels).

There are then a few repos that do bespoke pieces of analysis (2 of which are on CRAN) looking at spatial analysis and calculating survey statistics. We are hoping to bring these onboard, either by wrapping them to use the output of our downloaded harmonised datasets, or by writing additional tools for downstream analysis (see TODO.md).

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

Yes:

R CMD check results
0 errors | 0 warnings | 0 notes
annakrystalli commented 6 years ago

Editor checks:


Editor comments

Hello @OJWatson πŸ‘‹ and many thanks for your submission!

So the package passes most of the initial checks apart from the fact the testing suite is failing for me on my machine. Any ideas?

I've included relevant outputs of checks and tests below. The main error seems to refer to an unused argument (TRUE), I think by tempdir(TRUE) calls?

Otherwise, there's only a minor goodpractice suggestion flag about long lines in a couple of places.

If you could help with debugging the testing failure, I can start looking for reviewers?

checks

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs"
devtools::check(pkg_dir)

#> R CMD check results
#> 1 error  | 0 warnings | 1 note
#> checking tests ... ERROR
#>   Running β€˜testthat.R’
#> Running the tests in β€˜tests/testthat.R’ failed.
#> Last 13 lines of output:
#>   ══ testthat results  ════════════════════════════════════════════════════
#>   OK: 35 SKIPPED: 21 FAILED: 19
#>   1. Error: geojson works (@test_api_endpoints.R#60) 
#>   2. Error: dhs_countries works (@test_api_endpoints.R#89) 
#>   3. Error: dhs_data works (@test_api_endpoints.R#106) 
#>   4. Error: dhs_dataUpdates works (@test_api_endpoints.R#132) 
#>   5. Error: dhs_datasets works (@test_api_endpoints.R#141) 
#>   6. Error: dhs_indicators works (@test_api_endpoints.R#154) 
#>   7. Error: dhs_info works (@test_api_endpoints.R#174) 
#>   8. Error: dhs_publications works (@test_api_endpoints.R#183) 
#>   9. Error: dhs_survey_characteristics works (@test_api_endpoints.R#203) 
#>   1. ...
#>   
#>   Error: testthat unit tests failed
#>   Execution halted
#> 
#> checking R code for possible problems ... NOTE
#> find_rdhs_config: possible error in tempdir(TRUE): unused argument
#>   (TRUE)
#> set_rdhs_config: possible error in tempdir(TRUE): unused argument
#>   (TRUE)

Created on 2018-07-26 by the reprex package (v0.2.0).

tests

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs"
devtools::test(pkg_dir)

#> Loading rdhs
#> Loading required package: testthat
#> Testing rdhs
#> Warning: `encoding` is deprecated; all files now assumed to be UTF-8
#> βœ” | OK F W S | Context
#> 
⠏ |  0       | API calls
β ‹ |  0     1 | API calls
βœ” |  0     1 | API calls
#> ─────────────────────────────────────────────────────────────────────────
#> test_api_calls.R:4: skip: can request api through dhs_api_request via client
#> No authentication available
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | API endpoints
β ‹ |  0     1 | API endpoints
β ™ |  0 1   1 | API endpoints
β Ή |  0 2   1 | API endpoints
β Έ |  0 3   1 | API endpoints
β Ό |  0 4   1 | API endpoints
β ΄ |  0 5   1 | API endpoints
β ¦ |  0 6   1 | API endpoints
β § |  0 7   1 | API endpoints
β ‡ |  0 8   1 | API endpoints
⠏ |  0 9   1 | API endpoints
β ‹ |  0 10   1 | API endpoints
β ™ |  0 11   1 | API endpoints
β Ή |  0 12   1 | API endpoints
βœ– |  0 12   1 | API endpoints
#> ─────────────────────────────────────────────────────────────────────────
#> test_api_endpoints.R:8: skip: format catches and al_results tests
#> No authentication available
#> 
#> test_api_endpoints.R:60: error: geojson works
#> unused argument (TRUE)
#> 1: dhs_data(countryIds = "SN", surveyYearStart = 2014, breakdown = "subnational", 
#>        returnGeometry = TRUE, f = "geojson") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:60
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:135
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:89: error: dhs_countries works
#> unused argument (TRUE)
#> 1: dhs_countries(countryIds = "SN", surveyYearStart = "2010", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:89
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:440
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:106: error: dhs_data works
#> unused argument (TRUE)
#> 1: dhs_data(countryIds = "EG", indicatorIds = "FE_FRTR_W_TFR", selectSurveys = "latest", 
#>        all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:106
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:135
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:132: error: dhs_dataUpdates works
#> unused argument (TRUE)
#> 1: dhs_data_updates(lastUpdate = "20150901", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:132
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:1057
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:141: error: dhs_datasets works
#> unused argument (TRUE)
#> 1: dhs_datasets(countryIds = "EG", selectSurveys = "latest", surveyYearStart = 2000, 
#>        surveyYearEnd = 2016, surveyType = "DHS", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:141
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:837
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:154: error: dhs_indicators works
#> unused argument (TRUE)
#> 1: dhs_indicators(countryIds = "EG", all_results = FALSE, indicatorIds = "FE_FRTR_W_TFR") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:154
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:230
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:174: error: dhs_info works
#> unused argument (TRUE)
#> 1: dhs_info(infoType = "version", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:174
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:345
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:183: error: dhs_publications works
#> unused argument (TRUE)
#> 1: dhs_publications(countryIds = "EG", all_results = FALSE, selectSurveys = "latest") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:183
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:740
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:203: error: dhs_survey_characteristics works
#> unused argument (TRUE)
#> 1: dhs_survey_characteristics(countryIds = "EG", surveyYearStart = 2000, surveyYearEnd = 2016, 
#>        surveyType = "DHS", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:203
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:638
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:219: error: dhs_surveys works
#> unused argument (TRUE)
#> 1: dhs_surveys(countryIds = "EG", surveyYearStart = 2000, surveyYearEnd = 2016, 
#>        surveyType = "DHS", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:219
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:552
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:233: error: dhs_tags works
#> unused argument (TRUE)
#> 1: dhs_tags(indicatorIds = "FE_FRTR_W_TFR", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:233
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:1002
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> 
#> test_api_endpoints.R:242: error: dhs_uiUpdates works
#> unused argument (TRUE)
#> 1: dhs_ui_updates(lastUpdate = "20150901", all_results = FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:242
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:285
#> 3: check_for_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:27
#> 4: rdhs_setup() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:7
#> 5: find_rdhs_config() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/zzz.R:27
#> 6: file.path(tempdir(TRUE), "rdhs/rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:7
#> ─────────────────────────────────────────────────────────────────────────
#> 
#> ══ Terminating early ════════════════════════════════════════════════════
#> Too many failures

Created on 2018-07-26 by the reprex package (v0.2.0).

goodpractice

Checks for r package development best practice

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs"
goodpractice::gp(pkg_dir)

#> Preparing: covr
#> Preparing: cyclocomp
#> Installing 5 packages: fansi, openssl, pillar, sp, utf8
#> Preparing: description
#> Preparing: lintr
#> Preparing: namespace
#> Preparing: rcmdcheck
#> ── GP rdhs ────────────────────────────────────────────────────────────────
#> 
#> It is good practice to
#> 
#>   βœ– avoid long code lines, it is bad for readability. Also, many
#>     people prefer editor windows that are about 80 characters
#>     wide. Try make your lines shorter than 80 characters
#> 
#>     R/config.R:282:1
#>     R/read_dhs_flat.R:113:1
#> 
#> ───────────────────────────────────────────────────────────────────────────

Created on 2018-07-26 by the reprex package (v0.2.0).

OJWatson commented 6 years ago

Hi @annakrystalli - thank you for checking this through.

Also a lot of the tests skip as you need the login details for the dummy account I've set up with the DHS website. These are stored within rdhs.json.tar.enc and are unlocked with a key that I've stored within my travis account. I can send the key if it would be helpful, or if there is a better way to share these securely then let me know as I'm still quite new to this sort of stuff.

Thank you again,

OJ

annakrystalli commented 6 years ago

Thanks for the quick response @OJWatson!

So I re-cloned the repo but I'm still getting errors during the tests. I include the output below, with some questions at the end of the comment:

check

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs"
devtools::check(pkg_dir)

#> R CMD check results
#> 1 error  | 0 warnings | 1 note
#> checking tests ... ERROR
#>   Running β€˜testthat.R’ [16s/486s]
#> Running the tests in β€˜tests/testthat.R’ failed.
#> Last 13 lines of output:
#>   downloaded 80 KB
#>   
#>   ══ testthat results  ════════════════════════════════════════════════════
#>   OK: 64 SKIPPED: 21 FAILED: 8
#>   1. Error: geojson works (@test_api_endpoints.R#60) 
#>   2. Error: ETAR71FL.ZIP test (@test_downloads.R#139) 
#>   3. Error: ugir41fl.zip test (@test_downloads.R#148) 
#>   4. Error: zip file ending test (@test_downloads.R#157) 
#>   5. Error: Hierarchal and sas7bdat dataset test (@test_downloads.R#168) 
#>   6. Error: Large India Files (@test_downloads.R#214) 
#>   7. Error: Odd nesting India (@test_downloads.R#223) 
#>   8. Error: surveyId in extract (@test_extraction.R#109) 
#>   
#>   Error: testthat unit tests failed
#>   Execution halted
#> 
#> checking R code for possible problems ... NOTE
#> tempdir_check: possible error in tempdir(TRUE): unused argument (TRUE)

Created on 2018-07-26 by the reprex package (v0.2.0).

test

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs"
devtools::test(pkg_dir)
#> Loading rdhs
#> Loading required package: testthat
#> Testing rdhs
#> Warning: `encoding` is deprecated; all files now assumed to be UTF-8
#> βœ” | OK F W S | Context
#> 
⠏ |  0       | API calls
β ‹ |  0     1 | API calls
βœ” |  0     1 | API calls
#> ─────────────────────────────────────────────────────────────────────────
#> test_api_calls.R:4: skip: can request api through dhs_api_request via client
#> No authentication available
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | API endpoints
β ‹ |  0     1 | API endpoints
β ™ |  0 1   1 | API endpoints
β Ή |  1 1   1 | API endpoints
β Έ |  2 1   1 | API endpoints
β Ό |  3 1   1 | API endpoints
β ΄ |  4 1   1 | API endpoints
β ¦ |  5 1   1 | API endpoints
β § |  6 1   1 | API endpoints
β ‡ |  7 1   1 | API endpoints
⠏ |  8 1   1 | API endpoints
β ‹ |  9 1   1 | API endpoints
β ™ | 10 1   1 | API endpoints
β Ή | 11 1   1 | API endpoints
β Έ | 12 1   1 | API endpoints
β Ό | 13 1   1 | API endpoints
β ΄ | 14 1   1 | API endpoints
β ¦ | 15 1   1 | API endpoints
β § | 16 1   1 | API endpoints
β ‡ | 17 1   1 | API endpoints
⠏ | 18 1   1 | API endpoints
β ‹ | 19 1   1 | API endpoints
β ™ | 20 1   1 | API endpoints
β Ή | 21 1   1 | API endpoints
β Έ | 22 1   1 | API endpoints
β Ό | 23 1   1 | API endpoints
β ΄ | 24 1   1 | API endpoints
β ¦ | 25 1   1 | API endpoints
β § | 26 1   1 | API endpoints
β ‡ | 27 1   1 | API endpoints
⠏ | 28 1   1 | API endpoints
β ‹ | 29 1   1 | API endpoints
βœ– | 29 1   1 | API endpoints [303.2 s]
#> ─────────────────────────────────────────────────────────────────────────
#> test_api_endpoints.R:8: skip: format catches and al_results tests
#> No authentication available
#> 
#> test_api_endpoints.R:60: error: geojson works
#> argument is of length zero
#> 1: dhs_data(countryIds = "SN", surveyYearStart = 2014, breakdown = "subnational", 
#>        returnGeometry = TRUE, f = "geojson") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_api_endpoints.R:60
#> 2: handle_api_request(endpoint, query, all_results, client, force) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/api_endpoints.R:135
#> 3: api_request(endpoint, query, all_results) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:48
#> 4: handle_pagination_geojson(endpoint, query, all_results) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/API.R:95
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | Authentication
β ‹ |  0     1 | Authentication
β ™ |  0     2 | Authentication
βœ” |  0     2 | Authentication
#> ─────────────────────────────────────────────────────────────────────────
#> test_authentication.R:5: skip: authenticate_dhs works
#> No authentication available
#> 
#> test_authentication.R:24: skip: available_surveys works
#> No authentication available
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | Client Setup
β ‹ |  0     1 | Client Setup
β ™ |  0     2 | Client Setup
βœ” |  0     2 | Client Setup
#> ─────────────────────────────────────────────────────────────────────────
#> test_client_set_up.R:5: skip: save credentials
#> No authentication available
#> 
#> test_client_set_up.R:30: skip: new updates are recognised
#> No authentication available
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | Downloads
β ‹ |  0     1 | Downloads
β ™ |  0   1 1 | Downloads
β Ή |  0 1 1 1 | Downloads
β Έ |  0 1 2 1 | Downloads
β Ό |  0 2 2 1 | Downloads
β ΄ |  0 2 3 1 | Downloads
β ¦ |  0 3 3 1 | Downloads
β § |  0 3 4 1 | Downloads
β ‡ |  0 4 4 1 | Downloads
⠏ |  0 4 4 2 | Downloads
β ‹ |  0 4 5 2 | Downloads
β ™ |  0 5 5 2 | Downloads
β Ή |  0 5 6 2 | Downloads
β Έ |  0 6 6 2 | Downloads
βœ– |  0 6 6 2 | Downloads
#> ─────────────────────────────────────────────────────────────────────────
#> test_downloads.R:4: skip: available surveys and download work
#> No authentication available
#> 
#> test_downloads.R:139: warning: ETAR71FL.ZIP test
#> cannot open file 'rdhs.json': No such file or directory
#> 
#> test_downloads.R:139: error: ETAR71FL.ZIP test
#> cannot open the connection
#> 1: new_rand_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_downloads.R:139
#> 2: rdhs::client_dhs(api_key = "ICLSPH-527168", config = read_rdhs_config_file("rdhs.json"), 
#>        root = td) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/helper_authentication.R:25
#> 3: read_rdhs_config_file("rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:18
#> 4: jsonlite::fromJSON(readChar(config_path, fsize), FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 5: readChar(config_path, fsize) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 6: file(con, "rb")
#> 
#> test_downloads.R:148: warning: ugir41fl.zip test
#> cannot open file 'rdhs.json': No such file or directory
#> 
#> test_downloads.R:148: error: ugir41fl.zip test
#> cannot open the connection
#> 1: new_rand_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_downloads.R:148
#> 2: rdhs::client_dhs(api_key = "ICLSPH-527168", config = read_rdhs_config_file("rdhs.json"), 
#>        root = td) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/helper_authentication.R:25
#> 3: read_rdhs_config_file("rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:18
#> 4: jsonlite::fromJSON(readChar(config_path, fsize), FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 5: readChar(config_path, fsize) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 6: file(con, "rb")
#> 
#> test_downloads.R:157: warning: zip file ending test
#> cannot open file 'rdhs.json': No such file or directory
#> 
#> test_downloads.R:157: error: zip file ending test
#> cannot open the connection
#> 1: new_rand_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_downloads.R:157
#> 2: rdhs::client_dhs(api_key = "ICLSPH-527168", config = read_rdhs_config_file("rdhs.json"), 
#>        root = td) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/helper_authentication.R:25
#> 3: read_rdhs_config_file("rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:18
#> 4: jsonlite::fromJSON(readChar(config_path, fsize), FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 5: readChar(config_path, fsize) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 6: file(con, "rb")
#> 
#> test_downloads.R:168: warning: Hierarchal and sas7bdat dataset test
#> cannot open file 'rdhs.json': No such file or directory
#> 
#> test_downloads.R:168: error: Hierarchal and sas7bdat dataset test
#> cannot open the connection
#> 1: new_rand_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_downloads.R:168
#> 2: rdhs::client_dhs(api_key = "ICLSPH-527168", config = read_rdhs_config_file("rdhs.json"), 
#>        root = td) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/helper_authentication.R:25
#> 3: read_rdhs_config_file("rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:18
#> 4: jsonlite::fromJSON(readChar(config_path, fsize), FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 5: readChar(config_path, fsize) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 6: file(con, "rb")
#> 
#> test_downloads.R:189: skip: Geo dataset test
#> No authentication available
#> 
#> test_downloads.R:214: warning: Large India Files
#> cannot open file 'rdhs.json': No such file or directory
#> 
#> test_downloads.R:214: error: Large India Files
#> cannot open the connection
#> 1: new_rand_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_downloads.R:214
#> 2: rdhs::client_dhs(api_key = "ICLSPH-527168", config = read_rdhs_config_file("rdhs.json"), 
#>        root = td) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/helper_authentication.R:25
#> 3: read_rdhs_config_file("rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:18
#> 4: jsonlite::fromJSON(readChar(config_path, fsize), FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 5: readChar(config_path, fsize) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 6: file(con, "rb")
#> 
#> test_downloads.R:223: warning: Odd nesting India
#> cannot open file 'rdhs.json': No such file or directory
#> 
#> test_downloads.R:223: error: Odd nesting India
#> cannot open the connection
#> 1: new_rand_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_downloads.R:223
#> 2: rdhs::client_dhs(api_key = "ICLSPH-527168", config = read_rdhs_config_file("rdhs.json"), 
#>        root = td) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/helper_authentication.R:25
#> 3: read_rdhs_config_file("rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:18
#> 4: jsonlite::fromJSON(readChar(config_path, fsize), FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 5: readChar(config_path, fsize) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 6: file(con, "rb")
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | Extraction
β ‹ |  0     1 | Extraction
β ™ |  0   1 1 | Extraction
β Ή |  0 1 1 1 | Extraction
β Έ |  0 1 1 2 | Extraction
βœ– |  0 1 1 2 | Extraction
#> ─────────────────────────────────────────────────────────────────────────
#> test_extraction.R:5: skip: query codes having downloaded surveys
#> No authentication available
#> 
#> test_extraction.R:109: warning: surveyId in extract
#> cannot open file 'rdhs.json': No such file or directory
#> 
#> test_extraction.R:109: error: surveyId in extract
#> cannot open the connection
#> 1: new_rand_client() at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_extraction.R:109
#> 2: rdhs::client_dhs(api_key = "ICLSPH-527168", config = read_rdhs_config_file("rdhs.json"), 
#>        root = td) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/helper_authentication.R:25
#> 3: read_rdhs_config_file("rdhs.json") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:18
#> 4: jsonlite::fromJSON(readChar(config_path, fsize), FALSE) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 5: readChar(config_path, fsize) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:283
#> 6: file(con, "rb")
#> 
#> test_extraction.R:125: skip: rbind_labelled
#> No authentication available
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | Miscellaneous utils
β ‹ |  1       | Miscellaneous utils
β ™ |  2       | Miscellaneous utils
β Ή |  3       | Miscellaneous utils
β Έ |  4       | Miscellaneous utils
β Ό |  5       | Miscellaneous utils
β ΄ |  6       | Miscellaneous utils
β ¦ |  7       | Miscellaneous utils
β § |  8       | Miscellaneous utils
β ‡ |  8     1 | Miscellaneous utils
⠏ |  9     1 | Miscellaneous utils
β ‹ |  9   1 1 | Miscellaneous utils
β ™ | 10   1 1 | Miscellaneous utils
βœ” | 10   1 1 | Miscellaneous utils [0.1 s]
#> ─────────────────────────────────────────────────────────────────────────
#> test_miscellaneous.R:38: skip: slow api response
#> No authentication available
#> 
#> test_miscellaneous.R:64: warning: different locales
#> OS reports request to set locale to "French_Belgium.1252" cannot be honored
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | Parse datasets
β ‹ |  1       | Parse datasets
β ™ |  2       | Parse datasets
β Ή |  3       | Parse datasets
β Έ |  4       | Parse datasets
β Ό |  5       | Parse datasets
β ΄ |  6       | Parse datasets
β ¦ |  7       | Parse datasets
β § |  8       | Parse datasets
β ‡ |  9       | Parse datasets
⠏ | 10       | Parse datasets
β ‹ | 11       | Parse datasets
β ™ | 12       | Parse datasets
β Ή | 13       | Parse datasets
β Έ | 14       | Parse datasets
β Ό | 15       | Parse datasets
β ΄ | 16       | Parse datasets
β ¦ | 17       | Parse datasets
β § | 17     1 | Parse datasets
βœ” | 17     1 | Parse datasets [6.9 s]
#> ─────────────────────────────────────────────────────────────────────────
#> test_read_dhs_flat.R:75: skip: lower case flat file check
#> No authentication available
#> ─────────────────────────────────────────────────────────────────────────
#> 
⠏ |  0       | user interface
β ‹ |  0     1 | user interface
β ™ |  0     2 | user interface
β Ή |  0     3 | user interface
β Έ |  0     4 | user interface
β Ό |  0     5 | user interface
β ΄ |  0     6 | user interface
β ¦ |  0     7 | user interface
β § |  0     8 | user interface
β ‡ |  0     9 | user interface
βœ” |  0     9 | user interface
#> ─────────────────────────────────────────────────────────────────────────
#> test_ui.R:4: skip: check_for_client
#> No authentication available
#> 
#> test_ui.R:37: skip: check on the use of the default client for API caching in extract
#> No authentication available
#> 
#> test_ui.R:67: skip: get_datasets with no temp first
#> No authentication available
#> 
#> test_ui.R:97: skip: get_model_datasets
#> No authentication available
#> 
#> test_ui.R:107: skip: get_downloaded_datasets
#> No authentication available
#> 
#> test_ui.R:118: skip: get_available_datasets
#> No authentication available
#> 
#> test_ui.R:128: skip: search_and_extract_dhs
#> No authentication available
#> 
#> test_ui.R:146: skip: get_variable_labels
#> No authentication available
#> 
#> test_ui.R:168: skip: get_variable_labels direct via client
#> No authentication available
#> ─────────────────────────────────────────────────────────────────────────
#> 
#> ══ Results ══════════════════════════════════════════════════════════════
#> Duration: 310.6 s
#> 
#> OK:       56
#> Failed:   8
#> Warnings: 8
#> Skipped:  21

Created on 2018-07-26 by the reprex package (v0.2.0).

questions

OJWatson commented 6 years ago

Thank you for the reply @annakrystalli and sorry about these failing tests.

Thank you again for this and sorry about the tests,

OJ

annakrystalli commented 6 years ago

Hi OJ,

OK thanks for the clarification, that all makes more sense now but yeah I think a short vignette would be really useful. Do you think you get something sorted soonish for the reviewers to have a look at?

Re: the tests, no worries, that's what the review process is all about, ironing out the kinks! :)

Regarding:

their API is currently really slow due to the size of the objects returned, and frequently doesn't return correctly and subsequently freezes their API up for a bit.

I was wondering if maybe this sort of failure could be handled more explicitly, ie by throwing an informative error regarding why it failed?

Otherwise, I'm happy to start seeking reviewers if you are?

OJWatson commented 6 years ago

Hi,

And yep happy for reviewers.

Thank you again for all the help,

OJ

annakrystalli commented 6 years ago

Good news @OJWatson, we have reviewers! πŸŽ‰ Many thanks @dosgillespie & @czeildi for agreeing to review.


Reviewers: @dosgillespie @czeildi Due date: 2018-08-28

czeildi commented 6 years ago

Hi,

@OJWatson I have one small remark: the NEWS could be more concise, more shorter bullet points grouped according to new function / bug fix etc would make it easier to skim through it. You may consider getting ideas from http://style.tidyverse.org/news.html.

FYI I won't be reviewing the package as discussed with Anna.

OJWatson commented 6 years ago

Hi,

Thanks for this @czeildi - have never written a NEWS section before, and reading it back I have definitely waffled. I'll trim and tidy it up in line with the style guide you've linked.

Many thanks, OJ

annakrystalli commented 6 years ago

Hi @OJWatson, just to keep you up to date, I have been trying to find a new reviewer, hoping it wouldn't take too long but alas, it has. I'm still on the case but as I'm still on the case I will return the status of the review to seeking reviewers.

I'll also reset the deadline for review when a new reviewer has been found. Sorry about this! If you have any suggestions for a reviewer let me know!

annakrystalli commented 6 years ago

Great news! We finally have two reviewers again! πŸŽ‰ Thanks again to both for agreeing to review πŸ™


Reviewers: @dosgillespie @LucyMcGowan Due date: 2018-09-24 2018-09-30 (extended)

OJWatson commented 6 years ago

Yay! Thanks very much for finding someone so quickly @annakrystalli, and thank you to @LucyMcGowan and @dosgillespie for reviewing.

I've also updated/tidied the NEWS in keeping more with the tidyverse style guide.

LucyMcGowan commented 6 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

~##### For packages co-submitting to JOSS~

~- [ ] The package has an obvious research application according to JOSS's definition~

~The package contains a paper.md matching JOSS's requirements with:~

~- [ ] A short summary describing the high-level functionality of the software~ ~- [ ] Authors: A list of authors with their affiliations~ ~- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.~ ~- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).~

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 2.5


Review Comments

Summary

The rdhs package facilitates querying the Demographic and Health Survey API and makes downloading and interacting with the raw datasets a smoother, more reproducible process. This looks great!

A few things:

  1. I didn't run the full test suite, because I didn't request access in time (it looks like it takes a few days to approve, I am having a πŸ‘Ά sometimes this week (🀞) and I wanted to be sure to get this review in). It looks like others have been able to run successfully, though.

  2. Of the tests that ran, I had two failures -- I believe because I opted out of having the results cached.

    1. Error: format catches and all_results tests (@test_api_endpoints.R#11) 
    2. Error: password obscure (@test_miscellaneous.R#77) 

    Would you consider having an additional skip condition for this circumstance?

  3. Many of the functions don't have examples in the documentation -- it would be great to have a few more that include examples. The ones that do have examples, it is not always clear to me what the example is doing. For example, here is the examples section from dhs_data():

## not run only because they take a very long time and interact with an API
## Not run: 
dat <- dhs_data(countryIds="EG",all_results=FALSE)
dat <- dhs_data(indicatorIds="FE_FRTR_W_TFR",all_results=FALSE)
dat <- dhs_data(surveyIds="SN2010DHS",all_results=FALSE)
dat <- dhs_data(selectSurveys="latest",all_results=FALSE)
dat <- dhs_data(selectSurveys="byIndicator", indicatorIds="FE_CEBA_W_CH0",
all_results=FALSE)
dat <- dhs_data(surveyYear="2010",all_results=FALSE)
dat <- dhs_data(surveyYearStart="2006",all_results=FALSE)
dat <- dhs_data(surveyYearStart="1991", surveyYearEnd="2006",
all_results=FALSE)
dat <- dhs_data(surveyType="DHS",all_results=FALSE)
dat <- dhs_data(surveyCharacteristicIds="32",all_results=FALSE)
dat <- dhs_data(characteristicCategory="wealth quintile",all_results=FALSE)
dat <- dhs_data(breakdown="all", countryIds="AZ", characteristicLabel="6+",
all_results=FALSE)
dat <- dhs_data(tagIds="1",all_results=FALSE)
dat <- dhs_data(breakdown="subnational",all_results=FALSE)
dat <- dhs_data(breakdown="background",all_results=FALSE)
dat <- dhs_data(breakdown="all",all_results=FALSE)
dat <- dhs_data(f="html",all_results=FALSE)
dat <- dhs_data(f="geojson", returnGeometry="true",all_results=FALSE)

## End(Not run)

This seems like you're showing lots of different ways to run this function, but what each one does is not explained. I would maybe include far fewer lines (maybe just the ones you think would be the most common) along with comments to explain what output is expected (this applies to a lot of the functions, I'm just showing the one here).

  1. You have a message that states:
    
    Writing your configuration to:
    -> /var/folders/9x/fpr7_fbn4ln194by50dnpgyw0000gn/T//RtmpQO2pa4/rdhs/rdhs.json

If you are using git, be sure to add this to your .gitignore



You may consider adding this to the .gitignore for the user? For example **httr** does this with an `add_line()` function: https://github.com/r-lib/httr/blob/976289a3596dc01dc994f8fd743770a172bbecdb/R/oauth-cache.R#L61-L75
and then adding the config file like this: https://github.com/r-lib/httr/blob/976289a3596dc01dc994f8fd743770a172bbecdb/R/oauth-cache.R#L61-L75

5. In the Introduction, section 3 outlines how to download survey datasets. This involves "providing as arguments your email and project for which you want to download datasets from. You will then be prompted for your password." This is the first time a password is mentioned in this document -- as someone not familiar with the interface, I was confused about where I needed to obtain this password -- maybe a link to the website would be helpful here?

6. I caught a few typos, I submitted this PR: [A few small typos #48](https://github.com/OJWatson/rdhs/pull/48)
OJWatson commented 6 years ago

Hi there @LucyMcGowan,

Thank you so much for the review, especially with your πŸ‘Ά due so soon. Hope everything goes well.

I will respond fully by the end of today, with my changes but I agree with everything that you have suggested, in particular making sure everything has an example and then shortening the examples that the API functions run to just one function call.

Thank you again,

OJ

annakrystalli commented 6 years ago

Due date: 2018-09-24 2018-09-30 (extended)

OJWatson commented 6 years ago

Nearly finished - some of the examples have proven more tricky to write so that they run due to auth issues etc, but am nearly there.

OJWatson commented 6 years ago

Hi @LucyMcGowan,

Finally got there with the examples. So working through the comments.

  1. Hope everything is going okay with :baby: and thank you again for finding the time for the review.

  2. The test suite used to fail if you had not created a valid "rdhs.json" config file. I've touched the API tests won't fail by first creating a config if not found, as these tests can run without a DHS website account. This is included in commit d66c8e

    The other tests that require a login, however, will still require people to follow the testing vignette here

  3. There are now examples for all the exported functions, where they are needed. Where possible, these examples are run using the model datasets that the DHS website publishes, which anyone can download. This way you can use the main function in ui.R, e.g. get_downloaded_datasets

#' # get the model datasets included with the package
#' model_datasets <- model_datasets
#'
#' # download one of them
#' g <- get_datasets(dataset_filenames = model_datasets$FileName[1])
#'
#' # these will then be stored so that we know what datasets we have downloaded
#' d <- get_downloaded_datasets()

With regards to the examples for the API functions, the long list of examples that you included for dhs_data was in there as it is a copy of all the examples that their API documentation provide, so I thought it would be nice to show how each one would be carried out using rdhs. I agree though this is not helpful, so I have added an initial couple of examples of common/useful use cases for each function with more descriptions as they go along. Then I explain what the long list of examples is so that people can link back to the DHS API website. For example for dhs_ui_updates

#' # The main use for the ui updates API will be to search for the last time
#' # there was a change to the UI. For example to return all the
#' # changes since 2018:
#'
#' dat <- dhs_ui_updates(lastUpdate="20180101")
#'
#' # A complete list of examples for how each argument to the ui updates API
#' # endpoint can be provided is given below, which is a copy of each of
#' # the examples listed in the API at:
#'
#' # https://api.dhsprogram.com/#/api-uiupdates.cfm
#'
#'
#' dat <- dhs_ui_updates(lastUpdate="20150901",all_results=FALSE)
#' dat <- dhs_ui_updates(f="html",all_results=FALSE)
#' }

I have, however, put these examples within \dontrun{} as their API can be temperamental.

  1. Thanks for linking httr's way of doing this as this is much neater. Have implemented as well as adding it to the .Rbuildignore. This is included in commit 1ba973.

  2. Completely agree. I have added an initial paragraph in both the README and the introduction vignette for this, which was included in commit 6c5554

  3. Thank you picking these out and double thank you for making a PR with them.

I think that's everything. I've also made a few changes outside these comments, which clear up some of the issues that were outstanding and have added another example of how to use the "geojson" return type from API requests to build maps, shown here


Thank you again for the kind words about the package and for the comments and the suggestions. Definitely helped increase the usability of the package and thanks for linking the add_line function from httr, as i end up wanting that functionality a lot.

All the best,

OJ

annakrystalli commented 6 years ago

Hi @OJWatson, thanks for your swift response to @LucyMcGowan 's comments (and thanks @LucyMcGowan for them and your contributions, hope all is going well!). The package and documentation are really shaping up really nicely.

Re (5)

Thoughts?

OJWatson commented 6 years ago

Hi @annakrystalli,

Thanks for the comments.

Completely agree. I have block quoted the registering for the DHS website bit within the README.md, so it now appears as the first section after the overview of the DHS, and is hopefully a lot clearer.

I have also moved mention of the testing vignette into the CONTRIBUTING.md, and have removed it from the README.md as i don't think it's that likely users will need to know about that step unless they are contributing. The CONTRIBUTING page has also been linked to within the pkgdown navbar to make it easier to find as well,

All the best and thanks for mentioning about the contributing.md and linking the dotgithubfiles repo as hadn't seen that before.

OJ

dosgillespie commented 6 years ago

Hi @OJWatson @annakrystalli

First I'm really sorry it has taken so long for me to look at this - I have been going through quite a busy period with work and family

Also I'm conscious that my review isn't very 'techie' as I've never developed a package myself and certainly don't have the know-how to comment on the details. But I am coming to this as someone interested in exploring the DHS data for the first time who is an experienced R user.

So my review is focused around my own worked example - in the R script below where I have annotated my thought process as I worked through and came up against snags.

To pull out two main comments:

Hope this helps Duncan

DHS.txt

dosgillespie commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing:

4

Review Comments

see above for comments

annakrystalli commented 6 years ago

Thanks for your review @dosgillespie! πŸŽ‰

I have one comment regarding the format of it. The commented script you supplied is useful but it would be even more helpful if your workings were in an Rmd or reprex so the developer can see output of your experimentation. Trying to re-create your workflow to understand your comments is somewhat cumbersome but more importantly, doesn't guarantee they'll be able to recreate behaviors that might specific to your system and setup.

More importantly, there seem to be specific points/issues identified in some of your script. For example I noticed you had some extra unexpected installation steps? This is really useful info to the developer and so should be explicitly flagged for response. Can you please pick out and list any such key points explicitly in your review so the developer can respond to them? It's totally fine to list points as suggestions too.

I also notice that you have ticked the final approval box. Please save that for final approval of the package once the reviewer has responded to your points.

Thanks again for your efforts so far!

dosgillespie commented 6 years ago

Hi I've tried redoing it in an rmd and writing that to a pdf so I can attach here - sorry if still a bit rubbish rdhs_review.pdf

I think the only specific extra installation step I had to do was to install the "digest" library, but I'm not sure if that was a problem with my system or not.

Duncan

OJWatson commented 6 years ago

Hi @dosgillespie and @annakrystall,

Thank you so much for the reviews and no worries about the timing - no rush from me and just glad to have the opportunity to have the code reviewed.

I've just come back from annual leave so haven't had time yet to go through the comments but i will try to respond sometime this week.

All the best,

OJ

OJWatson commented 5 years ago

Hi @dosgillespie and @annakrystalli ,

Apologies for the delay - was at a conference when i last messaged and had hoped to find some time during it to get round to this.

Firstly, thank you Duncan for taking time to go through it, and also for documenting your thoughts as you went through the introductory vignette. I found these really helpful as there were lots of little points that you made that helped me correct sections in the documentation and vignettes that aren't explained very well.

I have structured my response by responding to the comments I found while reading the review you wrote. This should answer the 2 main comments you had as well.

  1. The note you wrote about having to separately install digest is interesting. I would have thought that would have been installed as part of installing devtools, so I can only think that maybe digest was deleted somehow between installing devtools and installing rdhs. Any other thoughts @annakrystalli

  2. Thank you for highlighting that I have not defined what endpoints are when talking about the API. I don't think referring to them as endpoints in the introductory document is that helpful, so i have adjusted this text to mention more that the endpoints refer to the different datasets that you interact with from the API.

  3. The link between the county ID's and the country names is not very clear as the DHS uses their own 2 letter code, which I don't think is standardised. It was in the second section of the introductory vignette, but this is not very obvious, so I've made another vignette just showing this, which is available at: https://ojwatson.github.io/rdhs/articles/country_codes.html

  4. You've mentioned wanting a data dictionary for the data sets. This is a good point and something that is sort of in the package, but again not very clearly explained or highlighted. In your review you mentioned that the searching functions (search_variables and search_variable_labels) are useful for helping find what you are looking for within a data set, and are how we think it's easiest for people to query for specific questions. However, we didn't highlight how you can quickly look at what the survey variables mean. To help with this I've lengthened section 4 in the introductory section to show the different ways that you can see the data dictionary for a given dataset.

  5. You mentioned that it would be good to have the instructions on setting up an account first. I have now added a similar block quoted section to the one on the GitHub Readme to the introductory vignette, and have explained in it which sections of the vignette you can do without the login details set up.


I think that addresses the comments I could find. All the best and thank you again for the review,

OJ

annakrystalli commented 5 years ago

Thanks for your response @OJWatson, just gonna run a couple of final checks and get back to you on your query (1).

@dosgillespie please have a look at OJs responses and let us know if you are satisfied with the responses or whether there any further comments before signing off. Many thanks again for your time.

dosgillespie commented 5 years ago

Hi folks

Yes all fine by me

Best wishes, Duncan

On Wed, 14 Nov 2018 at 13:49, Anna Krystalli notifications@github.com wrote:

Thanks for your response @OJWatson https://github.com/OJWatson, just gonna run a couple of final checks and get back to you on your query (1).

@dosgillespie https://github.com/dosgillespie please have a look at OJs responses and let us know if you are satisfied with the responses or whether there any further comments before signing off. Many thanks again for your time.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/238#issuecomment-438667673, or mute the thread https://github.com/notifications/unsubscribe-auth/AQwquPZap5Ir2Oxguc-a26_K9n_2K0whks5uvB9XgaJpZM4VYn2Z .

--

Dr Duncan Gillespie https://www.sheffield.ac.uk/scharr/sections/heds/staff/gillespie_d Research Fellow

Tel: +44 (0)114 222 4310

UK Centre for Tobacco and Alcohol Studies http://ukctas.net/

Health Economics and Decision Science ScHARR, University of Sheffield Regent's Court, 30 Regent Street https://maps.google.com/?q=30+Regent+StreetSheffield,+S1+4DA&entry=gmail&source=g Sheffield, S1 4DA https://maps.google.com/?q=30+Regent+StreetSheffield,+S1+4DA&entry=gmail&source=g

annakrystalli commented 5 years ago

Thanks @dosgillespie for signing off.

Almost there @OJWatson,

checks

So I've just run the checks in the build panel and get this error being triggered by an example.

Running examples in β€˜rdhs-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: read_zipdata
> ### Title: Read filetype from a zipped folder based on the file ending
> ### Aliases: read_zipdata
> 
> ### ** Examples
> 
> 
> # get the model datasets included in the package
> model_datasets <- model_datasets
> 
> # download just the zip
> g <- get_datasets(
+ dataset_filenames = model_datasets$FileName[1],
+ download_option = "zip"
+ )
Downloading: 
ModelDatasetCountry ModelDatasetSurveyYear DHS Births Recode Stata dataset (.dta) [zzbr62dt.zip]
Dataset download finished
> 
> # and then read from the zip. This function is used internally by rdhs
> # when using `get_datasets` with `download_option = .rds` (default)
> r <- read_zipdata(
+ g[[1]], pattern = ".dta"
+ )
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : 
  factor level [19] is duplicated
Calls: read_zipdata -> readfn -> factor
Execution halted
* checking for unstated dependencies in β€˜tests’ ... OK
* checking tests ...

Then the tests hang.

If I run checks() in the command line, the checks abort from a different error, and theh tests hang again:

Running examples in β€˜rdhs-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: data_and_labels
> ### Title: Create list of dataset and its variable names
> ### Aliases: data_and_labels
> 
> ### ** Examples
> 
> # get the model datasets included with the package
> model_datasets <- model_datasets
> 
> # download one of them
> g <- get_datasets(dataset_filenames = model_datasets$FileName[1])
Your datasets and API calls will be cached here: 
   -> /Users/Anna/Library/Caches/rdhs
Your datasets will be downloaded using the following config:

Error in rep("*", nchar(config$password)) : invalid 'times' argument
Calls: get_datasets -> check_for_client -> print_rdhs_config -> paste0
Execution halted

tests

If I run the tests through the RStudio build tab, I get this repeating ad infinitum and have to force quit RStudio:

rdhs would like to write to files outisde of your R temporary
directory. This is so that your datasets and API calls are cached 
between R sessions. Do you confirm rdhs to write to files outside
your R temporary directry? (Enter 1 or 2)

1: Yes
2: No

If I just run test() on the command line, I get the same interactive option again which asks me for a password whatever I respond (ie in both Yes and No case). My impression was that any tests requiring authorisation would just be skipped if credentials were not detected?

re digest

I'm not sure about digest. You've stated it as an import and it didn't appear as an issues in the initial checks. I uninstalled digest locally and installed rdhs from source with devtools::install(dependencies = T) and it worked fine for me (although had to restart R if I had devtools loaded for the namespacing to update). Also devtools::install_github("OJWatson/rdhs") worked fine also so can't reproduce @dosgillespie's issue..

OJWatson commented 5 years ago

oh wow - so many errors. Have had a look at them and have a few questions @annakrystalli

  1. Firstly i'm intrigued that the read_zipdata error threw in the R build panel. What package version of foreign do you have installed as maybe this fails with a newer/older version?
  2. The second fail in the examples I think makes sense to me so should have a fix for that.
  3. For the tests have you created an rdhs.json credentials file in the testing directory? As am trying to work out the likely place where the need for user prompt has appeared from.

I'll hopefully have time to fix these today (however their API is down/slow at the moment so i might be speaking prematurely).

Thanks again and sorry these have sprung up,

OJ

annakrystalli commented 5 years ago

Hey @OJWatson , not to worry.

1) The version of foreign I have is 0.8-69. 3) No but my understanding was that tests that required the rdhs.json would be skipped if it wasn't available. ie I should be able to test the functions that don't require authorisation and the rest should just skip.

OJWatson commented 5 years ago

Okay cool I will check that foreign version. And you are right in that's how i thought i had got the tests set up, but clearly i've changed something then but I have a better idea now of where.

Thanks again,

OJWatson commented 5 years ago

Hi @annakrystalli,

I was able to replicate the tests causing prompts, and I think i've found the cause and have patched that (4915de2).

I'm struggling to replicate the first error relating to the example in read_zipdata though with your version of foreign. I've just checked in a new clone of the repo v0.8-69 and can't get it to fail so i'm a little stumped. Does devtools:::run_example("man/read_zipdata.Rd") also throw the error when run in isolation? Maybe it was something in another example that caused it to break.

Sorry for not knowing better why that error was thrown, but I think the other error thrown in the examples has been addressed and the tests should skip correctly where authentication is required.

Thank you again for the help,

OJ

annakrystalli commented 5 years ago

Hey OJ,

So I still get this error whichever way I run the example:

Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : 
  factor level [19] is duplicated
Calls: read_zipdata -> readfn -> factor
Execution halted

Is that the error you can't reproduce?

It looks like something is being coerced to a factor and levels being set that are not unique (eg see this stackoverflow Q. Does that make sense to you?

There's also a NOTE being thrown from the checks:

checking R code for possible problems ... NOTE
tempdir_check: possible error in tempdir(TRUE): unused argument (TRUE)

The tests all pass now though so πŸ™Œ. However, they are throwing a warning for me:

OS reports request to set locale to "French_Belgium.1252" cannot be honored

Seems related to this?

OJWatson commented 5 years ago

Apologies - seem to have accidentally hit close! Didn't mean to.

OJWatson commented 5 years ago

Yes that's the error I could not replicate. Could you try the following, which i'm thinking should still fail, but i'm just checking it's definitely an error in foreign.

file <- "https://dhsprogram.com/customcf/legacy/data/sample_download_dataset.cfm?Filename=ZZBR62DT.ZIP&Tp=1&Ctry_Code=zz&survey_id=0&doctype=dhs"
tmp <- tempfile()
zip <- download.file(file, destfile = tmp)
r <- foreign::read.dta(grep(".DTA",contents,value=TRUE))

If it fails in the same way then i might just remove the use of foreign as it's particularly buggy and use haven as the default for reading in .dta files.


The note also makes sense and is related to the following:

tempdir_check <- function() {

  if (getRversion() < '3.5.0') {
    temp <- tempdir()
  } else {
    temp <- tempdir(TRUE)
  }

  return(temp)
}

This was only in as someone once managed to delete their temp directory while using the package. So we set this to TRUE so that with R 3.5.0 they solved this issue. Then we wrapped it in this for backwards compatibility. Would this be better to just leave as tempdir as if they don't have R v3.5.0 then they won't be able to rebuild their tempdir anyway?


And the OS report warning is something that responds to this issue that a user raised about not recognising the time format that the API results were returned in. So we wrote the following test for this, which tested changes to mdy_hms. We changed it so that we we set the locale variable LC_TIME to C when using strptime on dates from the API and then reset it back for the user.

test_that("different locales", {

  locale_lc_time <- Sys.getlocale("LC_TIME")
  on.exit(Sys.setlocale("LC_TIME", locale_lc_time))

  Sys.setlocale("LC_TIME","French_Belgium.1252")
  date <- "July, 15 2016 19:17:14"

  # our function catches for any locale issues
  expect_true(!is.na(mdy_hms(date)))

})

I'm not very familiar with system locale stuff so not sure what the best way to handle these are. I think the link you posted was the one that prompted us to change mdy_hms to set the locale variable LC_TIME to C.

OJWatson commented 5 years ago

Thanks again for the help debugging and sorry these last few steps are proving painful

OJWatson commented 5 years ago

have just noticed that travis checks have started failing and i think i might have a better idea where that first error comes from now.:

EDIT: I know where the error is from. It stems from the new version of haven that got pushed to CRAN recently. The changes to haven::labelled broke bits (https://cran.r-project.org/web/packages/haven/news/news.html), but should be easy to patch.

annakrystalli commented 5 years ago

Hey OJ! Sorry for delay in reply, I was away and took the whole weekend off! 😎

So re: reading in .dta files.

I tried running your code but had to make some slight adjustments to get it to work. Here's what I ran and the results of playing around a little:

setup file paths and tmp dir/file
file <- "https://dhsprogram.com/customcf/legacy/data/sample_download_dataset.cfm?Filename=ZZBR62DT.ZIP&Tp=1&Ctry_Code=zz&survey_id=0&doctype=dhs"
dest_file <- tempfile()
unzip_dir <- tempdir()
download data and unzip into unzip_dir
download.file(file, destfile = dest_file)
unzip(dest_file, exdir = unzip_dir)
identify dta file
dta_file <- list.files(unzip_dir, pattern = ".DTA", full.names = T)

foreign (tested on versions 0.8-69 & 0.8-71) default - ERRORS

foreign::read.dta(dta_file)
#> Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : factor level [19] is duplicated

foreign without converting factors - WORKS but attributes lost

dta_foreign <- tibble::as.tibble(foreign::read.dta(dta_file, convert.factors = F))
dta_foreign
#> # A tibble: 23,666 x 971
#>    caseid      bidx v000   v001  v002  v003  v004   v005  v006  v007  v008
#>  * <chr>      <int> <chr> <int> <int> <int> <int>  <int> <int> <int> <int>
#>  1 "        …     1 ZZ6       1     1     2     1 1.06e6     6  2015  1386
#>  2 "        …     2 ZZ6       1     1     2     1 1.06e6     6  2015  1386
#>  3 "        …     3 ZZ6       1     1     2     1 1.06e6     6  2015  1386
#>  4 "        …     4 ZZ6       1     1     2     1 1.06e6     6  2015  1386
#>  5 "        …     5 ZZ6       1     1     2     1 1.06e6     6  2015  1386
#>  6 "        …     6 ZZ6       1     1     2     1 1.06e6     6  2015  1386
#>  7 "        …     1 ZZ6       1     3     2     1 1.06e6     6  2015  1386
#>  8 "        …     1 ZZ6       1     4     2     1 1.06e6     6  2015  1386
#>  9 "        …     2 ZZ6       1     4     2     1 1.06e6     6  2015  1386
#> 10 "        …     1 ZZ6       1     4     3     1 1.06e6     6  2015  1386
#> # ... with 23,656 more rows, and 960 more variables: v009 <int>,
#> #   v010 <int>, v011 <int>, v012 <int>, v013 <int>, v014 <int>,
#> #   v015 <int>, v016 <int>, v017 <int>, v018 <int>, v019 <int>,
#> #   v019a <int>, v020 <int>, v021 <int>, v022 <int>, v023 <int>,
#> #   v024 <int>, v025 <int>, v026 <int>, v027 <int>, v028 <int>,
#> #   v029 <int>, v030 <int>, v031 <int>, v032 <int>, v034 <int>,
#> #   v040 <int>, v042 <int>, v044 <int>, v101 <int>, v102 <int>,
#> #   v103 <int>, v104 <int>, v105 <int>, v106 <int>, v107 <int>,
#> #   v113 <int>, v115 <int>, v116 <int>, v119 <int>, v120 <int>,
#> #   v121 <int>, v122 <int>, v123 <int>, v124 <int>, v125 <int>,
#> #   v127 <int>, v128 <int>, v129 <int>, v130 <int>, v131 <int>,
#> #   v133 <int>, v134 <int>, v135 <int>, v136 <int>, v137 <int>,
#> #   v138 <int>, v139 <int>, v140 <int>, v141 <int>, v149 <int>,
#> #   v150 <int>, v151 <int>, v152 <int>, v153 <int>, awfactt <int>,
#> #   awfactu <int>, awfactr <int>, awfacte <int>, awfactw <int>,
#> #   v155 <int>, v156 <int>, v157 <int>, v158 <int>, v159 <int>,
#> #   v160 <int>, v161 <int>, v166 <int>, v167 <int>, v168 <int>,
#> #   v190 <int>, v191 <int>, ml101 <int>, v201 <int>, v202 <int>,
#> #   v203 <int>, v204 <int>, v205 <int>, v206 <int>, v207 <int>,
#> #   v208 <int>, v209 <int>, v210 <int>, v211 <int>, v212 <int>,
#> #   v213 <int>, v214 <int>, v215 <int>, v216 <int>, v217 <int>, …
str(dta_foreign[,1:5])
#> Classes 'tbl_df', 'tbl' and 'data.frame':    23666 obs. of  5 variables:
#>  $ caseid: chr  "        1  1  2" "        1  1  2" "        1  1  2" "        1  1  2" ...
#>  $ bidx  : int  1 2 3 4 5 6 1 1 2 1 ...
#>  $ v000  : chr  "ZZ6" "ZZ6" "ZZ6" "ZZ6" ...
#>  $ v001  : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ v002  : int  1 1 1 1 1 1 3 4 4 4 ...

haven 2.0.0 - WORKS

dta_haven <- haven::read_dta(dta_file)
dta_haven
#> # A tibble: 23,666 x 971
#>    caseid      bidx v000   v001  v002  v003  v004   v005  v006  v007  v008
#>    <chr>      <dbl> <chr> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
#>  1 "        …    1. ZZ6      1.    1.    2.    1. 1.06e6    6. 2015. 1386.
#>  2 "        …    2. ZZ6      1.    1.    2.    1. 1.06e6    6. 2015. 1386.
#>  3 "        …    3. ZZ6      1.    1.    2.    1. 1.06e6    6. 2015. 1386.
#>  4 "        …    4. ZZ6      1.    1.    2.    1. 1.06e6    6. 2015. 1386.
#>  5 "        …    5. ZZ6      1.    1.    2.    1. 1.06e6    6. 2015. 1386.
#>  6 "        …    6. ZZ6      1.    1.    2.    1. 1.06e6    6. 2015. 1386.
#>  7 "        …    1. ZZ6      1.    3.    2.    1. 1.06e6    6. 2015. 1386.
#>  8 "        …    1. ZZ6      1.    4.    2.    1. 1.06e6    6. 2015. 1386.
#>  9 "        …    2. ZZ6      1.    4.    2.    1. 1.06e6    6. 2015. 1386.
#> 10 "        …    1. ZZ6      1.    4.    3.    1. 1.06e6    6. 2015. 1386.
#> # ... with 23,656 more rows, and 960 more variables: v009 <dbl>,
#> #   v010 <dbl>, v011 <dbl>, v012 <dbl>, v013 <dbl+lbl>, v014 <dbl+lbl>,
#> #   v015 <dbl+lbl>, v016 <dbl>, v017 <dbl>, v018 <dbl+lbl>,
#> #   v019 <dbl+lbl>, v019a <dbl+lbl>, v020 <dbl+lbl>, v021 <dbl>,
#> #   v022 <dbl>, v023 <dbl+lbl>, v024 <dbl+lbl>, v025 <dbl+lbl>,
#> #   v026 <dbl+lbl>, v027 <dbl>, v028 <dbl>, v029 <dbl>, v030 <dbl>,
#> #   v031 <dbl>, v032 <dbl>, v034 <dbl+lbl>, v040 <dbl>, v042 <dbl+lbl>,
#> #   v044 <dbl+lbl>, v101 <dbl+lbl>, v102 <dbl+lbl>, v103 <dbl+lbl>,
#> #   v104 <dbl+lbl>, v105 <dbl+lbl>, v106 <dbl+lbl>, v107 <dbl+lbl>,
#> #   v113 <dbl+lbl>, v115 <dbl+lbl>, v116 <dbl+lbl>, v119 <dbl+lbl>,
#> #   v120 <dbl+lbl>, v121 <dbl+lbl>, v122 <dbl+lbl>, v123 <dbl+lbl>,
#> #   v124 <dbl+lbl>, v125 <dbl+lbl>, v127 <dbl+lbl>, v128 <dbl+lbl>,
#> #   v129 <dbl+lbl>, v130 <dbl+lbl>, v131 <dbl+lbl>, v133 <dbl+lbl>,
#> #   v134 <dbl+lbl>, v135 <dbl+lbl>, v136 <dbl>, v137 <dbl>, v138 <dbl>,
#> #   v139 <dbl+lbl>, v140 <dbl+lbl>, v141 <dbl+lbl>, v149 <dbl+lbl>,
#> #   v150 <dbl+lbl>, v151 <dbl+lbl>, v152 <dbl+lbl>, v153 <dbl+lbl>,
#> #   awfactt <dbl>, awfactu <dbl>, awfactr <dbl>, awfacte <dbl>,
#> #   awfactw <dbl>, v155 <dbl+lbl>, v156 <dbl+lbl>, v157 <dbl+lbl>,
#> #   v158 <dbl+lbl>, v159 <dbl+lbl>, v160 <dbl+lbl>, v161 <dbl+lbl>,
#> #   v166 <dbl+lbl>, v167 <dbl+lbl>, v168 <dbl+lbl>, v190 <dbl+lbl>,
#> #   v191 <dbl>, ml101 <dbl+lbl>, v201 <dbl>, v202 <dbl>, v203 <dbl>,
#> #   v204 <dbl>, v205 <dbl>, v206 <dbl>, v207 <dbl>, v208 <dbl+lbl>,
#> #   v209 <dbl+lbl>, v210 <dbl>, v211 <dbl>, v212 <dbl>, v213 <dbl+lbl>,
#> #   v214 <dbl>, v215 <dbl+lbl>, v216 <dbl+lbl>, v217 <dbl+lbl>, …
str(dta_haven[, 1:5])
#> Classes 'tbl_df', 'tbl' and 'data.frame':    23666 obs. of  5 variables:
#>  $ caseid: atomic          1  1  2         1  1  2         1  1  2         1  1  2 ...
#>   ..- attr(*, "label")= chr "case identification"
#>   ..- attr(*, "format.stata")= chr "%15s"
#>  $ bidx  : atomic  1 2 3 4 5 6 1 1 2 1 ...
#>   ..- attr(*, "label")= chr "birth column number"
#>   ..- attr(*, "format.stata")= chr "%8.0g"
#>  $ v000  : atomic  ZZ6 ZZ6 ZZ6 ZZ6 ...
#>   ..- attr(*, "label")= chr "country code and phase"
#>   ..- attr(*, "format.stata")= chr "%3s"
#>  $ v001  : atomic  1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "label")= chr "cluster number"
#>   ..- attr(*, "format.stata")= chr "%8.0g"
#>  $ v002  : atomic  1 1 1 1 1 1 3 4 4 4 ...
#>   ..- attr(*, "label")= chr "household number"
#>   ..- attr(*, "format.stata")= chr "%8.0g"

sessioninfo::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  os       macOS  10.14.1              
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2018-11-26                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.4.0)
#>  backports     1.1.2   2017-12-13 [1] CRAN (R 3.4.3)
#>  cli           1.0.1   2018-09-25 [1] CRAN (R 3.4.4)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.4.1)
#>  digest        0.6.18  2018-10-10 [1] CRAN (R 3.4.4)
#>  evaluate      0.11    2018-07-17 [1] CRAN (R 3.4.4)
#>  forcats       0.3.0   2018-02-19 [1] CRAN (R 3.4.3)
#>  foreign       0.8-71  2018-07-20 [1] CRAN (R 3.4.4)
#>  haven         2.0.0   2018-11-22 [1] CRAN (R 3.4.4)
#>  hms           0.4.2   2018-03-10 [1] CRAN (R 3.4.4)
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.4.0)
#>  knitr         1.20    2018-02-20 [1] CRAN (R 3.4.3)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.4.0)
#>  pillar        1.2.1   2018-02-27 [1] CRAN (R 3.4.3)
#>  pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.4.4)
#>  R6            2.3.0   2018-10-04 [1] CRAN (R 3.4.4)
#>  Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.4.4)
#>  readr         1.1.1   2017-05-16 [1] CRAN (R 3.4.0)
#>  rlang         0.3.0.1 2018-10-25 [1] CRAN (R 3.4.4)
#>  rmarkdown     1.10    2018-06-11 [1] CRAN (R 3.4.4)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.4.3)
#>  sessioninfo   1.1.0   2018-09-25 [1] CRAN (R 3.4.4)
#>  stringi       1.2.4   2018-07-20 [1] CRAN (R 3.4.4)
#>  stringr       1.3.1   2018-05-10 [1] CRAN (R 3.4.4)
#>  tibble        1.4.2   2018-01-22 [1] CRAN (R 3.4.3)
#>  utf8          1.1.3   2018-01-03 [1] CRAN (R 3.4.3)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.4.4)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.4.4)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.4/Resources/library

Created on 2018-11-26 by the reprex package (v0.2.0).

So the problem looks like it is indeed emanating from foreign's attempt to coerce to factors. I'm wondering whether in might be related to changes in R version 3.5.0, namely that factor(x, levels, labels) now allow duplicated labels (not duplicated levels). You can get TRAVIS to explictly test for backcompatibility, might be worth it at this stage, given the differing behaviour you have set for tempfile also. Have a look at this: https://docs.travis-ci.com/user/languages/r#r-versions. Would be interested to know whether you manage to reproduce the foreign error when testing on on r oldrel. Overall haven seems to be the better approach IMHO, atleast for this use case as (if we have correctly identified the source of the error), it offers backcompatibility.


Re: tempdir_check, hmmmm I see what you mean. I think the function works to provide back compatibility but just throws this note when checking on using a version < 3.5.0.

I'm not sure what to suggest atm. I appreciate there have been debates regarding this which it seems check = TRUE option in 3.5 was designed to address. But using it means differing behaviour according to R version (ie tmpdir retrievable with 3.5 but not retrievable with 3.4). While this might be acceptable (but would atleast need pointing out in the documentation somewhere), I feel I'd need a bit more understanding of how exactly the user's workflow got messed up and whether this is something rdhs should be solving or better practice from the user. My feeling is that tmp files are precisely that, temporary and designed to be deleted. So could you provide a bit more detail on how the error arose? ie


Re locale, I see. I'm going to have to seek some advice on this myself so will get back to you!

Thanks for sticking with the process! We'll get there πŸ‘

OJWatson commented 5 years ago

Hey,

Thanks for taking the time to go through this issue so thoroughly. I have been able to replicate the issue using the oldrel in travis (https://travis-ci.org/OJWatson/rdhs/jobs/459793328#L4126). :raised_hands: (P.S. Thanks for pointing out oldrel as i had not seen this before).

We will be moving towards using haven more than foreign, so I have changed the default function used to read in .dta files to be haven::read_dta rather than the equivalent in foreign. This addresses that test error. Also I think since we use the haven::labelled format for reading in files as well, it makes sense to use haven for default options of reading data sets, while also providing people who have downstream pipelines using foreign. This is because I had to make a number of patches to address the changes in haven v 2.0.0 which dropped on CRAN 4 days ago. So we're going full haven :house:

Re: tempdir_check - I think the user simply deleted the temp directory using a file browser, so definitely falls under the "better practice from the user" heading. The result was then that if rdhs tried to set up the config in the temporary directory (the default if they have not given permission to R to write outside of their temporary directory), it was trying to save files etc to a missing directory. Since this is not an rdhs issue but more a user issue, I've reverted this back to just using tempdir as normal.

Thanks again for all the help, OJ

annakrystalli commented 5 years ago

OK, back with you.

So I pulled your changes and ran the tests. Getting a new failure and a couple of warnings (edited down):

devtools::test("/Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs")
#> Loading rdhs
#> Loading required package: testthat
#> Testing rdhs
───────────────────────────────────────────────────────────────────────────────────
⠏ |  0       | Miscellaneous utils
β ‹ |  1       | Miscellaneous utils
β ™ |  2       | Miscellaneous utils
β Ή |  3       | Miscellaneous utils
β Έ |  4       | Miscellaneous utils
β Ό |  5       | Miscellaneous utils
β ΄ |  6       | Miscellaneous utils
β ¦ |  7       | Miscellaneous utils
β § |  8       | Miscellaneous utils
β ‡ |  8     1 | Miscellaneous utils
⠏ |  9     1 | Miscellaneous utils
β ‹ |  9   1 1 | Miscellaneous utils
β ™ | 10   1 1 | Miscellaneous utils
β Ή | 10   1 2 | Miscellaneous utils
β Έ | 10   1 3 | Miscellaneous utils
β Ό | 10   2 3 | Miscellaneous utils
β ΄ | 10 1 2 3 | Miscellaneous utils
βœ– | 10 1 2 3 | Miscellaneous utils

1. The warning re: locale again:

#> test_miscellaneous.R:64: warning: different locales
#> OS reports request to set locale to "French_Belgium.1252" cannot be honored

I had a little play around with this and Im not sure whether it's testing what you want it to? ie on my system it fails to set the locale during testing (indeed it doesn't change the locale even if I run Sys.setlocale("LC_TIME","French_Belgium.1252") in the console. The test doesn't fail though because the date remains parsable by mdy_hms(date). Could you elaborate a little more on what it is you are trying to test here?

2. Warning re: object 'model_datasets'

#> test_miscellaneous.R:114: warning: model datasets onAttach
#> object 'model_datasets' not found

I should note however that I don't get these warnings when running devtools::check(). Will seek some advice on this.

3.Error : looks like another one requiring credentials that needs to be skipped

#> test_miscellaneous.R:115: error: model datasets onAttach
#> Login credentials for the DHS website have not been found.Please uset set_dhs_config() and provide your DHS email and project.
#> 1: rdhs::get_datasets("MWAR7ASV.ZIP") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/tests/testthat/test_miscellaneous.R:115
#> 2: client$get_datasets(dataset_filenames, download_option = download_option, reformat = reformat, 
#>        all_lower = all_lower, output_dir_root = output_dir_root, clear_cache = clear_cache, 
#>        ...) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/ui.R:128
#> 3: handle_config(private$config$config_path) at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/client.R:345
#> 4: stop("Login credentials for the DHS website have not been found.", "Please uset set_dhs_config() and provide your DHS email and project.") at /Users/Anna/Documents/workflows/rOpenSci/editorials/rdhs/R/config.R:35

#> ══ Results ════════════════════════════════════════════════════════════════════════
#> Duration: 46.8 s
#> 
#> OK:       59
#> Failed:   1
#> Warnings: 2
#> Skipped:  29

Created on 2018-11-29 by the reprex package (v0.2.0).

OJWatson commented 5 years ago

Hi again, and sorry for the new errors :(

1. re: locale

Some users' OS may have different locale environment variable LC_TIME, which for one user in this issue lead to them being unable to parse date strings that were returned from the DHS API. So we have a function mdy_hms, which will change their LC_TIME env. variable to be equal to C for the purposes of parsing dates from the API. The following reprex (run on a windows)(should kind of show the problem a bit clearer a bit more information about what the locale change is for:

library(rdhs)

# the API returns dates in the following string
date <- "July, 15 2016 19:17:14"

# these will be correctly parsed if LC_TIME recognises it
strptime(date, format = "%B, %d %Y %H:%M:%S")
#> [1] "2016-07-15 19:17:14 UTC"

# which in this session is
locale_lc_time <- Sys.getlocale("LC_TIME")
locale_lc_time
#> [1] "English_United Kingdom.1252"

# if however it does not, e.g French_Belgium.1252
Sys.setlocale("LC_TIME","French_Belgium.1252")
#> [1] "French_Belgium.1252"
Sys.getlocale("LC_TIME")
#> [1] "French_Belgium.1252"
strptime(date, format = "%B, %d %Y %H:%M:%S")
#> [1] NA

# our function mdy_hms will change your LC_TIME to C before using strptime
mdy_hms(date)
#> [1] "2016-07-15 19:17:14 UTC"

# but will not change LC_TIME
Sys.getlocale("LC_TIME")
#> [1] "French_Belgium.1252"

The above, however, I was only able to do on my windows machine, and on my linux machine I get the same cannot be honoured message as you do. As a result, I've wrapped the test in a tryCatch to catch for a warning related to this, and to skip that test if it does throw the warning, i.e. the OS can't handle it.

2 & 3:

These both refer to a test I added recently, which was to address an issue where I had a forgotten to make model_datasets, which is a core data set the package uses when downloading data sets, an internal data set as well as one that should be exposed to the end user for demo purposes etc. This meant that a lot of functions would not work without rdhs being attached. I've slightly tweaked the test to better test this, and put an authentication skip on it as it does require auth.


I've just pushed these changes now into the master branch as well, so hopefully :crossed_fingers: these tests should then be handled okay now (famous last words)

Thanks again for everything, OJ

annakrystalli commented 5 years ago

famous last words

...indeed! 😜

So, I asked for some advice from the rest of the editors and they hit up against another error when running the tests for the first time:

── 1. Error: post api tidy (@test_api_endpoints.R#354)  ────────────────────────
object 'conf' not found
1: .handleSimpleError(function (e)
   {
       handled <<- TRUE
       test_error <<- e
       options(expressions = expressions_opt_new)
       on.exit(options(expressions = expressions_opt), add = TRUE)
       e$expectation_calls <- frame_calls(11, 2)
       test_error <<- e
       register_expectation(e)
       e$handled <- TRUE
       test_error <<- e
   }, "object 'conf' not found", quote(eval(code, test_env))) at testthat/test_api_endpoints.R:354

Somehow I don't get this error, not sure why because I haven't set up credentials and get_rdhs_config() returns NULL for me. In any case, looking at the test itself I think there might be an error:

test_that("post api tidy", {

  if (file.exists("rdhs.json")) {
  conf <- read_rdhs_config_file("rdhs.json")
  } else {
    expect_true(TRUE)
  }

  if (is.null(conf$email)){
    expect_true(file.remove("rdhs.json"))
  }

})

Feels like

 if (is.null(conf$email)){
    expect_true(file.remove("rdhs.json"))
  }

shouldn't run if file.exists("rdhs.json") == FALSE right? At the minute, even if the file doesn't exist in the first place and hence there is no conf object to interogate for conf$email, it still tries to, hence the error object 'conf' not found. So I think this needs correcting.

Interestingly, the secomd time they ran the tests, this error didn't turn up! Does the initial run create and cache a rdhs.json file somewhere? Perhaps after this particular test is run?

annakrystalli commented 5 years ago

BTW... just pulled and ran check() and test() and all clear!! πŸ™Œ

annakrystalli commented 5 years ago

I'll be back tomorrow morning to do another quick pass and hopefully get everything approved! πŸ˜ƒ

OJWatson commented 5 years ago

Wooop! Okay and yes the above makes sense.

The API tests in test_api_endpoints will create an rdhs.json file in the current directory if there isn't one. Though if there wasn't one it won't have anything stored for email project password etc. So this tidy up was to make sure that any rdhs.json file created by the API tests is deleted if it will not be useful for the following tests that require auth. So i do a check on conf$email.

However, the following should have been inside the first if statement like so:

  if (file.exists("rdhs.json")) {
    conf <- read_rdhs_config_file("rdhs.json")
    if (is.null(conf$email)){
      expect_true(file.remove("rdhs.json"))
    }

  } else {
    expect_true(TRUE)
  }

However, I am a little confused though how they would have gotten to that test without creating an rdhs.json. The only thing I can guess is that the API was currently down when they ran the tests, so all the API tests above that test would have been skipped and thus they wouldn't have made the "rdhs.json" file. Either way the changes above should then catch for that now, and these have been pushed.

annakrystalli commented 5 years ago

Hey OJ,

Just been having a final review and look around the package, I made a small PR with some minor edits to the README, have a look and let me know what you think.

There's only one last point I want to raise. In client_dhs(), the API key is baked into the function as a default to argument api_key. While I appreciate it's a public key and see no reason for it not to be included in the source code, I feel a bit uncomfortable about it being so prominently user-facing. I think at some point I had even seen it turn up in errors when the function failed. I feel a better approach would be to set the default to NULL and then internally set it to the default one if no other API key is supplied to that argument.

Otherwise, I think we are good to πŸš€ πŸ˜ƒ

OJWatson commented 5 years ago

Hi,

Completely agree - I have moved the actual string to be an internal package data object, and have removed all occurrences of it being written down explicitly in the code (although it is still in old commits). As an aside the DHS programme said they wanted us to ship the package with that API key so that they could see how many people were using rdhs etc, but making it harder for people to get it makes sense.

Thanks again for everything,

OJ

annakrystalli commented 5 years ago

Approved! πŸ™Œ You're a 🌟. Excellent work on this @OJWatson! We're finally there πŸ˜ƒ

Thanks for submitting and thanks @LucyMcGowan & @dosgillespie for your reviews!

To-dos:

For JOSS submission

Once admin rights have been returned:

Should you want to awknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here. If you choose to, I suggest you do this before creating your codemeta file.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

Finally, if you're planning to submit to CRAN, have a look at the section about CRAN gotchas in the onboarding guide. And feel free to reach out if you need help through the process.

Well done again @OJWatson πŸ‘

OJWatson commented 5 years ago

Yay! :tada: - Thank you so much @annakrystalli for the help in these last stages. I've just transferred the repo over and will start addressing the links etc.

Thank you again to @LucyMcGowan and @dosgillespie for taking the time to review it. I would love to acknowledge both of you as reviewers in the DESCRIPTION if that is okay with both of you? Please let me know if that's okay, and also if you would like me to link your ORCID profile if you have one/would like it linked.

Also just wanted to say how helpful and nice the whole process has been. I've learnt loads and have picked up a number of really useful things during the review cycle. It's a really great initiative so thanks again to everyone.

OJWatson commented 5 years ago

Also @stefaniebutland - I'm definitely keen to write a blog post. Haven't quite decided what form but should be able to mock something up in the next week or so if that's okay?