irwpkg
is an R package designed to facilitate easy access to datasets
from the Item Response Warehouse
(IRW), offering functions to
download, explore, and analyze these datasets.
You can install the development version of irwpkg
from
GitHub with:
# install.packages("remotes")
remotes::install_github("hansorlee/irwpkg")
To use irwpkg
, load the package using:
library(irwpkg)
The IRW datasets are hosted on Redivis, so to
access these datasets through irwpkg
, you’ll need to authenticate with
your Redivis account. This connection is managed through the Redivis R
client.
Redivis Authentication Setup
When you first use a function in irwpkg
that connects to Redivis
(e.g. see list_available_datasets()
below), a browser window will
pop up prompting you to sign into Redivis.
After signing in, you’ll be prompted to allow access for the Redivis R Client. Click Allow.
Once you’ve granted access, close the browser window. You’ll see the
message “Authentication was successful” in the R console, and you
can then re-run your irwpkg
function.
The list_available_datasets() function provides a list of datasets in the IRW database, showing key metadata (dataset name, row count, and variable count).
# List available datasets
list_datasets = list_available_datasets()
dim(list_datasets)
## [1] 558 3
head(list_datasets)
## name numRows variableCount
## 1 lessR_Mach4 7020 3
## 2 movac_pakpour2022 72096 3
## 3 SABFI2_Gallardo_Pujol_2018_IntHapp 3771 3
## 4 science_ltm 2744 4
## 5 gilbert_meta_17 1840581 11
## 6 pks_probability 12096 4
db_info = get_database_metadata()
## IRW Database Metadata:
## --------------------------------------------------
## Version: v11.19
## Table Count: 558
## Created At: 2023-09-25 19:19:15
## Last Updated At: 2024-11-14 16:31:26
## Total Data Size (GB): 145.1 GB
## Active Data Size (GB): 145.01 GB
## DOI: https://doi.org/10.57761/4zaa-w743
## Dataset URL: https://redivis.com/datasets/as2e-cv7jb41fd?v=11.19
## Documentation: https://datapages.github.io/irw/
## Methodology: Tables have been harmonized as per details given [here](<https://datapages.github.io/irw/standard.html>).
##
## Usage Information: Please find information about data licenses and citation info [here](<https://datapages.github.io/irw/docs.html>).
##
##
## --------------------------------------------------
table_info = get_table_metadata("lessR_Mach4")
## Table Metadata for: lessR_Mach4
## --------------------------------------------------
## Name: lessR_Mach4
## Created At: 2024-11-14 16:25:32
## Last Updated At: 2024-11-14 16:31:26
## Number of Rows: 7020
## Data Size (KB): 141.86 KB
## Variable Count: 3
## Is Sample: No
## DOI: https://doi.org/10.57761/4zaa-w743
## Table URL: https://redivis.com/datasets/as2e-cv7jb41fd/tables/018s-9p87q43mh?v=11.19
## Container URL: https://redivis.com/datasets/as2e-cv7jb41fd?v=11.19
## --------------------------------------------------
# get tables with a column named "rater"
filter_tables(required_columns="rater")
## [1] "swmd_mokken" "dumas_Organisciak_2022"
## [3] "fractals_rating" "mpsycho_lakes"
## [5] "ptam1_immer" "autonomysupport_mokken"
## [7] "wine_luckett2021" "spelling2pronounce_edwards2023"
## [9] "immer12_immer" "famous_melodies"
# get tables with at least 10000 rows and a column named "rater"
filter_tables(n_rows = 10000, required_columns="rater")
## [1] "dumas_Organisciak_2022" "fractals_rating"
## [3] "mpsycho_lakes" "ptam1_immer"
## [5] "spelling2pronounce_edwards2023" "famous_melodies"
# get tables with columns "rater" and "rt"
filter_tables(required_columns=c("rater", "rt"))
## [1] "famous_melodies"
Once you have identified a dataset you want, you can use fetch_data()
to load it as a data frame in R. For example, to fetch the swmd_mokken
dataset, you can use:
# Fetch a dataset by name
swmd_mokken <- fetch_data(name="swmd_mokken")
dim(swmd_mokken)
## [1] 4557 4
head(swmd_mokken)
## resp item rater id
## 1 1 Item2 12 1000303
## 2 1 Item1 13 1000303
## 3 1 Item3 17 1001302
## 4 1 Item1 22 1001302
## 5 1 Item2 25 1001302
## 6 1 Item3 25 1001302
You can also fetch multiple datasets at once.
matching_tables <- filter_tables(n_rows = 50000, required_columns = c("rater"))
print(matching_tables)
## [1] "fractals_rating" "spelling2pronounce_edwards2023"
datasets <- fetch_data(c("fractals_rating", "spelling2pronounce_edwards2023"))
print(names(datasets)) # datasets is a named list
## [1] "fractals_rating" "spelling2pronounce_edwards2023"
You can also use the output of filter_tables()
directly to
fetch_data()
.
datasets <- fetch_data(matching_tables)
print(names(datasets))
## [1] "fractals_rating" "spelling2pronounce_edwards2023"
fractals_rating <- datasets[["fractals_rating"]]
head(fractals_rating)
## id item rater resp
## 1 Fractal_001 liveliness 1012 1
## 2 Fractal_001 verbalizability 1019 1
## 3 Fractal_001 favourableness 1007 1
## 4 Fractal_001 familiarity 1009 1
## 5 Fractal_001 familiarity 1019 1
## 6 Fractal_001 verbalizability 1007 1
download_data("swmd_mokken", path = "mydata.csv")