The datasetR
helps you generate a random datasets for your R project. It provides a preset random list of values with different data types (interval, ordinal, nominal values). It includes also a function for imputing a NULL, NA or missing values.
datasetR
package can be installed and downloaded from this Github repository using devtools package. More on
devtools package.
Installing is done using:
library(devtools)
install_github("tomaztk/datasetR")
Comes prepacked with a main function dsR()
that will help you generate the dataset. But first, let's create a list with 20 different variable types and a random values.
library(datasetR)
set_of_val <- set_of_val
And you will get a starting set of values:
Understand the predefined list of values for constructing the datasets.
Types explained:
With the following example, the code will create a dataframe of 100 rows with total of 8 variables. The 8 variables will be type:
The dimensions of the dataset is 8 variables and 100 rows of sampled data.
library(datasetR)
library(dplyr)
my_dataset <- dsR(vr="ms:3;bi:4;ii:1", nr=100);
When you want to create a desired dataset, use the vr
parameter and construct the string for the values.
The string is annotated as type : number of variables . When stating multiple types, make sure to separated them with semi-colon.
test_data <- dsR(vr="od:1;ms:1;bi:1;ii:1", nr=10);
And following statements will generate the dataset of the same dimension.
test_data <- dsR(vr="od:1;od:1", nr=10);
test_data <- dsR(vr="od:2", nr=10);
When you want to skew your dataset, you can add some missing values to your desired data. By using addMissingValues
on a desired dataset and desired column, the values will get replaced by NA
values.
In addition, the parameter pc
is for percent of values for given dataframe.column that you want to replace.
my_dataset$ii_1 <- addMissingValues(my_dataset, ii_1, pc = 10)
You are welcome to submit suggestions and report bugs: https://github.com/tomaztk/datasetR/issues
Thanks goes to all the of these contributors!
Documentation available https://tomaztk.github.io/datasetR and created with pkgdown.