Closed tsostarics closed 11 months ago
Note that adding manual scores
functionality would also require a change to the function that checks whether a contrast matrix was derived from contr.poly
. So, that function would need the scores argument passed to it as well.
Functionality for allowing empty parentheses in the formula syntax and additional arguments is available on the parens
branch. Will be merged once I write more documentation. In general though, for a four level factor var
in df
:
enlist_contrasts(df, var ~ contr.poly)
enlist_contrasts(df, var ~ contr.poly())
Are now equivalent, eliminating the need to remove the ()
when using tab-autocomplete.
# assuming level names 1, 2, 3, 4
enlist_contrasts(df, var ~ scaled_sum_code)
enlist_contrasts(df, var ~ scaled_sum_code())
enlist_contrasts(df, var ~ scaled_sum_code + 1)
enlist_contrasts(df, var ~ scaled_sum_code() + 1)
Are also all equivalent.
my_scores <- c(.1, .2, .5, .7)
enlist_contrasts(df, var ~ contr.poly(scores = c(.1, .2, .5, .7)))
enlist_contrasts(df, var ~ contr.poly(c(.1, .2, .5, .7)))
enlist_contrasts(df, var ~ contr.poly(scores = my_scores))
enlist_contrasts(df, var ~ contr.poly(my_scores))
Are all equivalent, importantly allowing for scores to be manually specified by the user when using orthogonal polynomials. Note that the 2nd and 4th ones, which don't specify the argument name, only work because scores
is the second positional argument (after n
, which is auto-filled). I would recommend specifying the argument name though.
enlist_contrasts(df, var ~ contr.poly(4))
will throw an error because the n
argument is automatically supplied through the package's functionality, so specifying 4
will be set to the next positional argument, here scores
.
enlist_contrasts(df, var ~ contr.poly(n=4))
will throw an error because n
is specified twice (again because the package does it automatically). A more helpful error message will be written for this later.
enlist_contrasts(dv, var ~ contr.poly(bogus = 5))
will throw an error because bogus
is an unused argument for contr.poly
glimpse_contrasts
remains unchanged for the time being, and will likely become a separate enhancement issue.
Not really an update but putting the current error messages for each example here for future reference when I update the errors:
enlist_contrasts(mtcars, gear ~ contr.poly(4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE) :
'scores' argument is of the wrong length
Should make clear that n
should not be specified in contrast-setting functions
enlist_contrasts(mtcars, gear ~ contr.poly(n=4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE) :
formal argument "n" matched by multiple actual arguments
Should make clear that n
should not be specified in contrast-setting functions
enlist_contrasts(mtcars, gear ~ contr.poly(bogus=4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE) :
unused argument (bogus = 4)
actually this one can stay as is I think
# Gear has 3 levels
enlist_contrasts(mtcars, gear ~ contr.poly(scores = 2))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE) :
'scores' argument is of the wrong length
enlist_contrasts(mtcars, gear ~ contr.poly(scores = 1:2))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE) :
'scores' argument is of the wrong length
The fix is in the scores = 1:n
part, but this can perhaps be made more clear by adding "length of scores should equal {length(levels(df$var))}
"
With some recent additions, namely to how functions and calls are handled, some of these aren't issues anymore. Really only this one is confusing/concerning:
enlist_contrasts(mtcars, gear ~ contr.poly(n=4))
Converting to factors: gear
Error in (function (n, scores = 1:n, contrasts = TRUE, sparse = FALSE) :
formal argument "n" matched by multiple actual arguments
This should be addressed in .bundle_params
. I'll fix this right now. The behavior here will be that n
is replaced by the actual number of levels with a warning that this is happening. The situation this could occur in is if your analysis has a factor with 4 levels, then you filter a portion out and set the column to a factor not realizing you removed a level, leaving you with 3. If you have a set_contrasts later on that already has varName ~ foo(n=4)
then we'll run into the situation here.
Polynomial coding takes a default vector of scores equal to
1:k
fork
levels of an ordinal factor. Different real-valued scores can be set manually as well so long as the vector is of lengthk
. Personally I don't think I've seen this done before, but the most obvious circumstance would be in cases of scales that bin numeric values such as "1-3", "4-7", "8-10" etc. In this case you could replace the scores with the median values. Alternatively, you can calculate ridit values for each bin and use those.Currently there's no way to set these scores using the syntax of
set_contrasts(df, varname ~ contr.poly)
. This could be fixed by changing the way the formulas are parsed such thatset_contrasts(df, varname ~ contr.poly())
is valid syntax, delaying the execution ofcontr.poly()
(which would lead to an error). This would subsequently makeset_contrasts(df, varname ~ contr.poly(scores = score_vector))
valid, allowing scores to be passed through without needing to define (another) polynomial-specific operator. This has the added benefit of allowing parentheses in the syntax, eliminating the need to remove them whenever using tab-autocomplete for long scheme names such asbackward_difference_coding
--backward_difference_coding()
throws an error right now.Note that this may require another column in the glimpse table (from
glimpse_contrasts
) to denote the scores. This is, again, polynomial-specific like the currently-existingdropped_trends
column. Adding this may warrant another function argument likedescribe.polynomials
which takes values ofTRUE
for adding all polynomial-specific columns likedropped_trends
andscores
;FALSE
to not include them when they're not relevant (i.e, don't includedropped_trends
if there are no dropped trends);NA
to not include the columns at all, with a warning if they should be; and a (lazy) character vector to add a subset of the columns (e.g,.c('dropped_trends', 'scores')
andc('d', 's')
would show both columns). I've wanted to implement something like this for a while, but the main issue would be in cases where you want to combine two glimpse tables, but the number of columns would differ depending on how involved different approaches are with ordered factors.