Open BrandenOlson opened 5 years ago
This is the approach we took:
alakazam::checkColumns, which checks for column existence and whether the column is filled with NA
values.
That, and ensuring that you always subset columns with data[[column]]
rather than data[, column]
or data$column
to account for different slicing behaviors of data.tables, tibbles, and data.frames (w.r.t to whether they return a column as a vector or an n x 1 table).
Awesome, @javh - I was unaware of what looks like a very useful funciton. So as not to reinvent the wheel, it seems like getColumnAsVector
can essentially be a wrapper of alakazam::checkColumns
to both check that the column is fine and that it contains at least some data, and return dat[[column]]
if everything looks good.
It seems that the output for alakazam::checkColumns
is TRUE if there are no issues, and a string if there is some issue. Do you see any issues with checking this output via something like
column_check <- alakazam::checkColumns(dat, columns)
if(column_check != TRUE) {
stop(column_check)
}
?
Looks good to me.
Many functions take a data.table as input and need to access one or more particular columns. Currently, not much is done to check whether the column is actually present in the data.table. Thus, I propose creating a
getColumnAsVector
function which returns the column if the column is present and throws an error if the column is not present. Hopefully this will make sumrep easier to use as many obscure R errors reduce to a column name not being defined.