Distinguish between discrete and continuous variables

I agree with this, I am running up against the same issue now. If you are just looking for the types as currently assigned, you can do this:

sapply(sample_variables(pstat), function(v) { class(sample_data(pstat)[[v]]) })

However, I think we need to be explicit in assigning types to sample variables. A function should be implemented that accepts user input to assign types, or attempts to infer from the data. Inferring may not be 100% accurate. For example, R (read.table or similar) interprets "Subject ID" as an integer, but it should be a factor, since there is no meaningful ordering to the subjects. Still, inferring from the data would be a good first step.

I propose we have more than two types. I think our types should be according to the standard R data types:

factors: categorical/nominal variables
ordered factors: ordinal variables, useful for representing longitudinal variables and discretizing continuous variables
integer: continuous type
numeric/double: continuous type
character: text that does not need to be treated as a variable, mostly for display purposes.

These types will naturally suggest how to display them. For example, factors can be displayed using "select" inputs and qualitative color palettes, while ordered factors may also use "select" inputs but be displayed with sequential color palettes.

In addition, users should be able to indicate which covariates are "of interest". Perhaps there should be several categories, such as secondary/confounders, batch covariates, and random effects.

mani2012 / PathoStat

Distinguish between discrete and continuous variables #7