matsengrp / sumrep

Summary statistics for repertoires
16 stars 6 forks source link

Safer column accessing #27

Open BrandenOlson opened 5 years ago

BrandenOlson commented 5 years ago

Many functions take a data.table as input and need to access one or more particular columns. Currently, not much is done to check whether the column is actually present in the data.table. Thus, I propose creating a getColumnAsVector function which returns the column if the column is present and throws an error if the column is not present. Hopefully this will make sumrep easier to use as many obscure R errors reduce to a column name not being defined.

javh commented 5 years ago

This is the approach we took:

alakazam::checkColumns, which checks for column existence and whether the column is filled with NA values.

That, and ensuring that you always subset columns with data[[column]] rather than data[, column] or data$column to account for different slicing behaviors of data.tables, tibbles, and data.frames (w.r.t to whether they return a column as a vector or an n x 1 table).

BrandenOlson commented 5 years ago

Awesome, @javh - I was unaware of what looks like a very useful funciton. So as not to reinvent the wheel, it seems like getColumnAsVector can essentially be a wrapper of alakazam::checkColumns to both check that the column is fine and that it contains at least some data, and return dat[[column]] if everything looks good.

It seems that the output for alakazam::checkColumns is TRUE if there are no issues, and a string if there is some issue. Do you see any issues with checking this output via something like

column_check <- alakazam::checkColumns(dat, columns)
if(column_check != TRUE) {
    stop(column_check)
}

?

javh commented 5 years ago

Looks good to me.