Allow scores in polynomial contrasts

tsostarics commented 2 years ago

Polynomial coding takes a default vector of scores equal to 1:k for k levels of an ordinal factor. Different real-valued scores can be set manually as well so long as the vector is of length k. Personally I don't think I've seen this done before, but the most obvious circumstance would be in cases of scales that bin numeric values such as "1-3", "4-7", "8-10" etc. In this case you could replace the scores with the median values. Alternatively, you can calculate ridit values for each bin and use those.

Currently there's no way to set these scores using the syntax of set_contrasts(df, varname ~ contr.poly). This could be fixed by changing the way the formulas are parsed such that set_contrasts(df, varname ~ contr.poly()) is valid syntax, delaying the execution of contr.poly() (which would lead to an error). This would subsequently make set_contrasts(df, varname ~ contr.poly(scores = score_vector)) valid, allowing scores to be passed through without needing to define (another) polynomial-specific operator. This has the added benefit of allowing parentheses in the syntax, eliminating the need to remove them whenever using tab-autocomplete for long scheme names such as backward_difference_coding-- backward_difference_coding() throws an error right now.

Note that this may require another column in the glimpse table (from glimpse_contrasts) to denote the scores. This is, again, polynomial-specific like the currently-existing dropped_trends column. Adding this may warrant another function argument like describe.polynomials which takes values of TRUE for adding all polynomial-specific columns like dropped_trends and scores; FALSE to not include them when they're not relevant (i.e, don't include dropped_trends if there are no dropped trends); NA to not include the columns at all, with a warning if they should be; and a (lazy) character vector to add a subset of the columns (e.g,. c('dropped_trends', 'scores') and c('d', 's') would show both columns). I've wanted to implement something like this for a while, but the main issue would be in cases where you want to combine two glimpse tables, but the number of columns would differ depending on how involved different approaches are with ordered factors.

tsostarics commented 2 years ago

Note that adding manual scores functionality would also require a change to the function that checks whether a contrast matrix was derived from contr.poly. So, that function would need the scores argument passed to it as well.

tsostarics commented 2 years ago

Functionality for allowing empty parentheses in the formula syntax and additional arguments is available on the parens branch. Will be merged once I write more documentation. In general though, for a four level factor var in df:

enlist_contrasts(df, var ~ contr.poly)
enlist_contrasts(df, var ~ contr.poly())

Are now equivalent, eliminating the need to remove the () when using tab-autocomplete.

# assuming level names 1, 2, 3, 4
enlist_contrasts(df, var ~ scaled_sum_code)
enlist_contrasts(df, var ~ scaled_sum_code())
enlist_contrasts(df, var ~ scaled_sum_code + 1)
enlist_contrasts(df, var ~ scaled_sum_code() + 1)

Are also all equivalent.

my_scores <- c(.1, .2, .5, .7)
enlist_contrasts(df, var ~ contr.poly(scores = c(.1, .2, .5, .7)))
enlist_contrasts(df, var ~ contr.poly(c(.1, .2, .5, .7)))
enlist_contrasts(df, var ~ contr.poly(scores = my_scores))
enlist_contrasts(df, var ~ contr.poly(my_scores))

Are all equivalent, importantly allowing for scores to be manually specified by the user when using orthogonal polynomials. Note that the 2nd and 4th ones, which don't specify the argument name, only work because scores is the second positional argument (after n, which is auto-filled). I would recommend specifying the argument name though.

Errors:

enlist_contrasts(df, var ~ contr.poly(4)) will throw an error because the n argument is automatically supplied through the package's functionality, so specifying 4 will be set to the next positional argument, here scores.

enlist_contrasts(df, var ~ contr.poly(n=4)) will throw an error because n is specified twice (again because the package does it automatically). A more helpful error message will be written for this later.

enlist_contrasts(dv, var ~ contr.poly(bogus = 5)) will throw an error because bogus is an unused argument for contr.poly

glimpse_contrasts remains unchanged for the time being, and will likely become a separate enhancement issue.

tsostarics commented 2 years ago

Not really an update but putting the current error messages for each example here for future reference when I update the errors:

enlist_contrasts(mtcars, gear ~ contr.poly(4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE)  : 
  'scores' argument is of the wrong length

Should make clear that n should not be specified in contrast-setting functions

enlist_contrasts(mtcars, gear ~ contr.poly(n=4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE)  : 
  formal argument "n" matched by multiple actual arguments

Should make clear that n should not be specified in contrast-setting functions

enlist_contrasts(mtcars, gear ~ contr.poly(bogus=4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE)  : 
  unused argument (bogus = 4)

actually this one can stay as is I think

# Gear has 3 levels
enlist_contrasts(mtcars, gear ~ contr.poly(scores = 2))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE)  : 
  'scores' argument is of the wrong length

enlist_contrasts(mtcars, gear ~ contr.poly(scores = 1:2))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE)  : 
  'scores' argument is of the wrong length

The fix is in the scores = 1:n part, but this can perhaps be made more clear by adding "length of scores should equal {length(levels(df$var))}"

tsostarics commented 11 months ago

With some recent additions, namely to how functions and calls are handled, some of these aren't issues anymore. Really only this one is confusing/concerning:

enlist_contrasts(mtcars, gear ~ contr.poly(n=4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE)  : 
  formal argument "n" matched by multiple actual arguments

This should be addressed in .bundle_params. I'll fix this right now. The behavior here will be that n is replaced by the actual number of levels with a warning that this is happening. The situation this could occur in is if your analysis has a factor with 4 levels, then you filter a portion out and set the column to a factor not realizing you removed a level, leaving you with 3. If you have a set_contrasts later on that already has varName ~ foo(n=4) then we'll run into the situation here.

tsostarics / contrastable

Allow scores in polynomial contrasts #6

Errors: