ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

Pre-submission inquiry for {excluder}: Checks for Exclusion Criteria in Online Data #454

Closed JeffreyRStevens closed 3 years ago

JeffreyRStevens commented 3 years ago

Submitting Author: Jeffrey Stevens (@JeffreyRStevens)
Repository: https://github.com/JeffreyRStevens/excluder Submission type: Pre-submission


Package: excluder
Title: Checks for Exclusion Criteria in Online Data
Version: 0.2.1
Authors@R: 
    person(given = "Jeffrey R.",
           family = "Stevens",
           role = c("aut", "cre"),
           email = "jeffrey.r.stevens@gmail.com",
           comment = c(ORCID = "0000-0003-2375-1360"))
Description: Data that are collected through online sources such as Mechanical 
            Turk may require excluding data because of IP address duplication, 
            geolocation, or completion duration. This package facilitates
            exclusion of these data for Qualtrics datasets.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
URL: https://jeffreyrstevens.github.io/excluder/, https://github.com/jeffreyrstevens/excluder/
BugReports: https://github.com/jeffreyrstevens/excluder/issues/
Imports: 
    dplyr,
    iptools,
    janitor,
    lubridate,
    maps,
    tidyr,
    magrittr,
    lifecycle,
    rlang
Depends: 
    R (>= 3.5.0)
Suggests: 
    testthat (>= 3.0.0),
    readr,
    stringr,
    covr,
    knitr,
    rmarkdown
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

The package falls under data munging because it processes data from Qualtrics or other online sources by checking for, marking, and excluding rows of data frames for common exclusion criteria (e.g., IP addresses outside of the United States or duplicate entries from the same location/IP address).

N/A

The target audience is data scientists using Qualtrics or other online systems to collect data from participants (e.g., Mechanical Turk workers). Ensuring good data quality from these participants can be tricky. For instance, while Mechanical Turk in theory screens workers based on location (e.g., if you want to restrict your participant pool to workers in the United States), this is not necessarily represented in the data. Finding the tools to screen for IP address location can be tricky, and this package simplifies checking for and excluding participants based on common data that Qualtrics reports such as geolocation, IP address, duplicate records from the same location, participant screen resolution, participant progress through the survey, and survey completion duration.

There are no similar packages to my knowledge. The {qualtRics} package at rOpenSci focuses on importing data from Qualtrics. The {MTurkR} package directly interfaces with the MTurk Requestor API, but the APIs have been deprecated and the package has been removed from CRAN.

Yes, it seems to comply with this guidance. Depending on the data that the user collects, there could be personally identifiable information accessed by this package. In particular, IP addresses that are recorded by Qualtrics can be processed with this package. Note that the package only works with personally identifiable information from data sets that already exist on the users' local file system, and the package does not collect or transmit data in any way. The package also includes a function deindentify() that the user can use to strip location, IP address, language and even participant computer information (e.g., operating system, web browser, screen resolution) from the data frames to deidentify them.

I wanted to raise this pre-submission enquiry here because it seems like this package nicely complements the rOpenSci {qualtRics} package.

JeffreyRStevens commented 3 years ago

Be gentle---it's my first R package!

noamross commented 3 years ago

Thanks for opening this inquiry, @JeffreyRStevens, and we're glad you considered rOpenSci for your first package! excluder is well in-scope as it deals with data-manipulation tasks specific to a scientific data source. We look forward to a full submission.

JeffreyRStevens commented 3 years ago

Thanks, @noamross. Just to be clear, should I close this issue and start a new one for the full submission of the package? Or just add the submission here?

noamross commented 3 years ago

I'll close this and you can start a full submission whenever you're ready!