qualtdict: Generating Variable Dictionaries and Labelled Data Exports of Qualtrics Surveys

lyh970817 commented 1 year ago

Submitting Author Name: Yuhao Lin Submitting Author Github Handle: !--author1-->@lyh970817@maurolepore<!--end-editor-- Reviewers: TBD

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: qualtdict
Title: Generating Variable Dictionaries and Labelled Data Exports of Qualtrics
    Surveys
Version: 0.0.0.9000
Authors@R:
    person("Yuhao", "Lin", , "yuhao.lin@kcl.ac.uk", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-6357-5731"))
Description: Provides functions that generate variable dictionaries from
    'Qualtrics' <https://www.qualtrics.com/about/> surveys and labelled
    survey data based on the dictionary. This package is built upon the R
    package 'qualtRics' <https://github.com/ropensci/qualtRics/> which
    provides access to 'Qualtrics' survey data and metadata via the 'Qualtrics' API
    <https://api.qualtrics.com/>.
License: MIT + file LICENSE
URL: https://github.com/lyh970817/qualtdict
BugReports: https://github.com/lyh970817/qualtdict/issues
Imports:
    crul,
    dplyr,
    glue,
    haven,
    magrittr,
    openNLP,
    purrr,
    qualtRics,
    rlang,
    sjlabelled,
    slowraker,
    SnowballC,
    stringi,
    stringr,
    tibble,
    tidyr,
    xml2
Suggests:
    covr,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    vcr (>= 0.6.0)
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Config/testthat/start-first: dict_generate, dict_validate, get_survey_data
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [ ] data extraction
- [x] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Qualtrics is an online survey and data collection software platform. While the qualtRics R package implements data retrieval from the Qualtrics platform, this package 'qualtdict' further processes its output to generate variable dictionaries and labelled data designed to be used for data analyses directly.

Who is the target audience and what are scientific applications of this package?

The target audience is those who use the Qualtrics survey platform to collect data. This package generates variable dictionaries and labelled data designed to be used for data analyses directly.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

No, but there is the similar qualtRics R package that retrieves a broader range of data from Qualtrics than this package utilises. The output formats from qualtRics are much less user-friendly, for example, it retrieves survey metadata in a nested-list, json-like format, while this package rearranges essential parts of this metadata (retrieved using quatRics) into a publishable variable dictionary in a table format that can be visually inspected in, for example, excel.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Yes.

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 1 year ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Checks for qualtdict (v0.0.0.9000)

git hash: d31c0887

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 86%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:----------|------:| |internal |base | 179| |internal |qualtdict | 118| |internal |utils | 5| |internal |stats | 1| |imports |magrittr | 70| |imports |rlang | 8| |imports |glue | 7| |imports |qualtRics | 3| |imports |tibble | 3| |imports |openNLP | 2| |imports |sjlabelled | 2| |imports |xml2 | 2| |imports |stringi | 1| |imports |tidyr | 1| |imports |crul | NA| |imports |dplyr | NA| |imports |haven | NA| |imports |purrr | NA| |imports |slowraker | NA| |imports |SnowballC | NA| |imports |stringr | NA| |suggests |covr | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |vcr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

list (66), length (9), names (7), c (6), unique (6), unlist (6), args (4), ifelse (4), is.null (4), max (4), min (4), paste0 (4), all (3), is.na (3), rownames (3), as.matrix (2), colnames (2), factor (2), for (2), grep (2), is.character (2), levels (2), seq_along (2), split (2), structure (2), table (2), vapply (2), which (2), any (1), as.logical (1), character (1), class (1), data.frame (1), do.call (1), if (1), is.function (1), is.logical (1), labels (1), lapply (1), mode (1), numeric (1), q (1), readRDS (1), return (1), sum (1), suppressWarnings (1), tempdir (1), vector (1)

qualtdict

item_or_level_qid (10), rep_level_qid (10), suf_level_qid (9), null_na (7), not_applicable_qid (6), questiontext_qid (6), suf_item_rep_level_qid (6), suf_item_suf_level_qid (6), collapse (5), file_upload_qid (5), rep_level (3), retry (3), calc_keyword_scores (2), check_item (2), check_json (2), check_names (2), easyname_gen (2), label_to_sfx (2), paste_narm (2), qid_recode (2), recode_json (2), rep_item (2), sbs_qid (2), suf_level_suf_item_qid (2), suf_text_qid (2), timing_qid (2), add_text (1), add_text_mc (1), checkarg_isfunction (1), checkarg_isname (1), checkarg_isqualtdict (1), convert_html (1), dict_generate (1), dict_validate (1), get_survey_data (1), is_onetoone (1), order_name (1), suf_nmlabel_qid (1), text (1), which_not_onetoone (1)

magrittr

%>% (70)

rlang

abort (7), hash (1)

glue

glue (7)

utils

txtProgressBar (4), getFromNamespace (1)

qualtRics

fetch_description (1), fetch_survey (1), metadata (1)

tibble

tibble (2), enframe (1)

openNLP

Maxent_POS_Tag_Annotator (1), Maxent_Word_Token_Annotator (1)

sjlabelled

set_label (1), set_labels (1)

xml2

read_html (1), xml_text (1)

stats

setNames (1)

stringi

stri_count_words (1)

tidyr

unite (1)

**NOTE:** Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 10 files) and - 1 authors - 1 vignette - no internal data file - 17 imported packages - 3 exported functions (median 25 lines of code) - 110 non-exported functions in R (median 10 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 10| 59.0| | |files_vignettes | 1| 68.4| | |files_tests | 7| 86.4| | |loc_R | 1152| 71.7| | |loc_vignettes | 118| 30.8| | |loc_tests | 1014| 87.2| | |num_vignettes | 1| 64.8| | |n_fns_r | 113| 79.3| | |n_fns_r_exported | 3| 12.9| | |n_fns_r_not_exported | 110| 85.5| | |n_fns_per_file_r | 6| 75.4| | |num_params_per_fn | 5| 69.6| | |loc_per_fn_r | 11| 32.3| | |loc_per_fn_r_exp | 25| 55.9| | |loc_per_fn_r_not_exp | 10| 31.3| | |rel_whitespace_R | 17| 70.0| | |rel_whitespace_vignettes | 25| 21.4| | |rel_whitespace_tests | 1| 14.7| | |doclines_per_fn_exp | 43| 54.1| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 57| 69.0| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![check-standard.yaml](https://github.com/lyh970817/qualtdict/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/lyh970817/qualtdict/actions) [![test-coverage.yaml](https://github.com/lyh970817/qualtdict/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/lyh970817/qualtdict/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:-------------|:----------|:------|----------:|:----------| | 4076045888|R-CMD-check |success |d31c08 | 11|2023-02-02 | | 4076045893|test-coverage |success |d31c08 | 11|2023-02-02 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following check_fail: 1. no_import_package_as_a_whole #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 85.98 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 1 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 1

Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.3 | |pkgcheck |0.1.1.11 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

maurolepore commented 1 year ago

Dear @lyh970817, FYI I'm still searching for a handling editor. It shouldn't take much longer. Thanks for your patience.

lyh970817 commented 1 year ago

Dear @lyh970817, FYI I'm still searching for a handling editor. It shouldn't take much longer. Thanks for your patience.

Thank you so much!

maurolepore commented 1 year ago

@ropensci-review-bot assign @maurolepore as editor

ropensci-review-bot commented 1 year ago

Assigned! @maurolepore is now the editor

maurolepore commented 1 year ago

Dear @lyh970817 I'm delighted to announce that I'll be the handling editor of this submission.

Semantic tags for my comments

To help you track my comments I tagged them with "ml" and numbered sequentially: ml01, ml02, and so on. Comments following bullets are for you to consider -- you may or may not respond to them. Comments following check-boxes are requests for some action -- please respond.

Reviewers

[x] ml01. Can you please suggest three reviewers? Following our guidelines I'll use one at most, but I would like your view of the types of expertise needed to review qualtdict.

Checks

Here I list a few things that caught my attention. They are not blockers but the sooner we address them the better.

Package Dependencies

ml02. Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

goodpractice and other checks

ml03. R CMD check generated the following check_fail: no_import_package_as_a_whole
ml04. Avoid library() and require() calls in packages: 1 time

lyh970817 commented 1 year ago

Thank you so much for taking time to review this. These are my responses.

ml01. Unfortunately I'm not sure if I could name any specific authors. But expertise-wise I thought having someone with a psychology/social science background might be helpful. As qualtdict is centred around creating a variable dictionary giving an intuitive overview of survey data for analysts. The usefulness of such a dictionary is probably best judged by someone who analyses such data on a daily basis (in contrast to a data engineer who implements APIs for such data).

ml02. R CMD Check seems to fail without importing some of the packages that I don't actually use. For instance, without importing haven:

Error in `set_labels_helper(x = .dat, labels = labels, force.labels = forc
e.labels, 
    force.values = force.values, drop.na = drop.na, var.name = NULL)`: Pac
kage 'haven' required for this function. Please install it.

ml03. I use dplyr, purrr and stringr extensively so I import them as a whole. Should I still import functions from them (which will be many) individually?

ml04. I think it comes from this line in the tests:

library(vcr) # *Required* as vcr is set up on loading

which is mandatory for vcr to work.

maurolepore commented 1 year ago

ml02. Following your example with the haven package I saw you need to import haven::read_xpt because the sjlabelled package needs it. That surprises me. Usually each package must import any external function it needs, and not ask users to do it. Do you know why that's the case? Also I see haven is listed in .pre-commit.config.yaml -- which I'm not familiar with. So likely there is a good explanation and I just happen to never have encounter a case like this. It would be good to articulate an explanation because reviewers might be surprised too.
ml03. Yeah, AFAIK best practice is to either namespace each function each time you call it or import each function individually. For example, each time use something like dplyr::filter() or import it once with usethis::use_import_from("dplyr", "filter") then use it each time just like filter().
ml04. I see. Thanks!
[ ] ml05. When tests run I see a lot of printed output. Please suppress it so that reviewers can see a succinct test report. If the output is not generated from an R condition (e.g. messages, warnings, or errors) it may be hard to suppress. See capture.output() -- you may need to implement a way to capture the output and maybe implement a quietly argument you can set to TRUE during tests.
[ ] ml06. The test results I see show many warnings. Please address them if you don't expect them or suppress them if you do expect them. If you expect them it's best to make them go away so that you don't develop the habit of ignoring them and risk missing an important one that you don't expect.

[ FAIL 0 | WARN 591 | SKIP 0 | PASS 4 ]

[ ] ml07. Can you please make your project an RStudio project? Most R developers/contributors work in RStudio. Without an .Rproj file launching the project is hard, and I would like reviewers to enter your package as smoothly as possible. You may use usethis::use_rstudio(). And later it may help to lower the entry-barrier for contributors.

lyh970817 commented 1 year ago

ml02. I believe this is because in sjlabelled, haven is a package in the Suggets field. The function it calls from haven is not actually haven::read_xpt but I needed to import an arbitrary function from haven for the set_labels function to see and load it.

Please see the DESCRIPTION file for sjlabelled: https://github.com/strengejacke/sjlabelled/blob/master/DESCRIPTION.

Package: sjlabelled
Type: Package
Encoding: UTF-8
Title: Labelled Data Utility Functions
Version: 1.2.0.3
Authors@R: c(
    person("Daniel", "Lüdecke", role = c("aut", "cre"), email = "d.luedecke@uke.de", comment = c(ORCID = "0000-0002-8895-3206")),
    person("avid", "Ranzolin", role = "ctb", email = "daranzolin@gmail.com"),
    person("Jonathan", "De Troye", role = "ctb", email = "detroyejr@outlook.com")
    )
Maintainer: Daniel Lüdecke <d.luedecke@uke.de>
Description: Collection of functions dealing with labelled data, like reading and 
    writing data between R and other statistical software packages like 'SPSS',
    'SAS' or 'Stata', and working with labelled data. This includes easy ways 
    to get, set or change value and variable label attributes, to convert 
    labelled vectors into factors or numeric (and vice versa), or to deal with 
    multiple declared missing values.
License: GPL-3
Depends:
    R (>= 3.4)
Imports:
    insight,
    datawizard,
    stats,
    tools,
    utils
Suggests:
    dplyr,
    haven (>= 1.1.2),
    magrittr,
    sjmisc,
    sjPlot,
    knitr,
    rlang,
    rmarkdown,
    snakecase,
    testthat
URL: https://strengejacke.github.io/sjlabelled/
BugReports: https://github.com/strengejacke/sjlabelled/issues
RoxygenNote: 7.2.1
VignetteBuilder: knitr

And the specific lines where haven is loaded: https://github.com/strengejacke/sjlabelled/blob/548fa397bd013ec7e44b225dd971d19628fdc866/R/set_labels.R#L317.

What would be the best way to deal with this?

ml05-7. I was able to capture the outputs when drafting the package so I should be able to do that in the tests. The warnings are not intended and are due to package versions. I will resolve these and create an RStudio project and then update this comment. Thank you so much!

maurolepore commented 1 year ago

ml02. Thanks for explaining. The best solution will likely vary for each of the "unused" packages.

In the case of heaven, the file you showed me has a single call of the type haven::<some function> so it might be worth looking at the source code of that function and see if you can re-implement it and remove the dependency on haven.

https://github.com/strengejacke/sjlabelled/blob/548fa397bd013ec7e44b225dd971d19628fdc866/R/set_labels.R#L325

More generally, I think a great explanation of the trade-offs in dependencies is that of Jim Hester in his talk "It depends": https://www.youtube.com/watch?v=mum13N7CGUI . So as long as you understand those trade-offs you would be able to make an informed decision for each "unused" package and justify your decision if the reviewers ask.

maurolepore commented 1 year ago

Dear @lyh970817, Just checking. Would you be available to address the comments ml05-ml07? We can also put this submission on hold if you need more time. Let me know.

lyh970817 commented 1 year ago

Dear @lyh970817,

Just checking. Would you be available to address the comments ml05-ml07? We can also put this submission on hold if you need more time. Let me know.

Yes, sorry - would just need a couple more days to address these. Thanks.

maurolepore commented 5 months ago

@ropensci-review-bot put on hold

ropensci-review-bot commented 5 months ago

Submission on hold!

ropensci-review-bot commented 2 months ago

@maurolepore: Please review the holding status

maurolepore commented 2 months ago

@lyh970817, how would you like to proceed?

Resume the submission.
Continue on hold.
Withdrawal the submission.

The holding status will be revisited every 3 months, and after one year the issue will be closed. -- https://devdevguide.netlify.app/softwarereview_policies.html#policiesreviewprocess

maurolepore commented 1 week ago

Dear @lyh970817

I hope all is well. I totally understand priorities change. At this moment I believe this policy applies:

If the author hasn’t requested a holding label, but is simply not responding, we should close the issue within one month after the last contact intent. This intent will include a comment tagging the author, but also an email using the email address listed in the DESCRIPTION of the package which is one of the rare cases where the editor will try to contact the author by email. -- https://devdevguide.netlify.app/softwarereview_policies

FYI my next step is to confirm with the chief editor and if they agree I'll close the issue and let you know by email.

maurolepore commented 1 week ago

Dear @lyh970817 I confirmed with the chief editor and shared my next steps with the entire editorial board. I'll go ahead and close this issue and let you know by email.

Once again, I understand priorities change. Thank a lot for contributing to rOpenSci. We look forward to more contributions whenever it's a good time.

ropensci / software-review