mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
710 stars 59 forks source link

compose_data() silently overwrites data columns that match .n_name #295

Closed wpetry closed 2 years ago

wpetry commented 2 years ago

problem

When data with a column that matches the .n_name argument are passed to compose_data(), the data are silently overwritten with a single integer. It's clear why the function behaves this way, and maybe it was silly for me not to realize this would be a problem with my data. Still, "n" (the default) is likely to be a common data column name, for example for data from censuses or when using binomial responses. User-specified dimension index prefixes also have the potential for naming conflicts.

reprex

library(tidybayes)
dat1 <- data.frame(successes = rbinom(10, n = 20, p = 0.5), n = 20)
compose_data(dat1)

proposed solution

Return an error, or at least warn the user when a column in data conflicts with .n_name. An informative message to help the user resolve the conflict would be helpful too. Something like,

`{column name}` is reserved for the length of the data and has been dropped.
Rename this column in the input data or modify the `.n_name` prefixing
function to resolve the conflict.
mjskay commented 2 years ago

Good idea, I've added a warning. Thanks!