jeff-hughes / reghelper

R package with regression helper functions
5 stars 5 forks source link

beta() requires numeric matrix #9

Closed Tato14 closed 3 years ago

Tato14 commented 3 years ago

I am working with a glm() I would like to calculate the standard coefficients. All of my variables are factors or booleans.

When try to use beta(model) I got an error like

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

Should I change my "string" factors and booleans to numeric factors? In this case, is this the proper way to use this kind of function?

jeff-hughes commented 3 years ago

Hi there,

No, you shouldn't need to change your variables to numeric. The code is intended to work with factor variables.

However, there's nothing in the code that calls the colMeans() function, so it's a bit difficult for me to understand where the error is coming from. Could I get you to do two things:

  1. Please try calling reghelper::beta(model), which will ensure that it is indeed running the beta() function from this package. Given that it's a common mathematical term, the name can conflict with functions in other packages, and people have run into this in the past. (This was an unfortunate naming on my part.)
  2. If that does not resolve the problem, please provide a small reproducible example so I can try to determine the problem. Based on your description, I was not able to reproduce the error. If you can generate some data with the same general properties as the data you're trying to analyze, and provide the code to generate the data and analyze it, that will hopefully help me pin down the problem.

Thanks!

Tato14 commented 3 years ago

Hi,

I tried to use reghelper::beta(model) but the same error occurs. I attached a minimal example to reproduce the error:

library(reghelper)
library(readxl)
minimal_example <- read_excel("minimal_example.xlsx")
minimal_example$Exitus_90d <- as.logical(minimal_example$Exitus_90d)

model<- glm(Exitus_90d~Edad_fct+Genero, data = minimal_example, 
              family=binomial)

summary(model)

reghelper::beta(model)

returns: Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

minimal_example.xlsx

jeff-hughes commented 3 years ago

Hi there,

Thanks for the example, that really helps. I see the issue you're running into now -- there are a couple things going on here, both of which contribute to the issue.

  1. The labels of the variables include two problematic characters: hyphens and spaces. When just running the model, it's not a problem, as R only adds these in at the end to format the summary table. But when calculating the standardized coefficients, reghelper has to turn the single factor variable into multiple variables (one per contrast), in order to properly standardize them. What I opted to do was to create a new variable name that includes the label for that factor -- but when the label has spaces and hyphens, those aren't valid variable names.
  2. At least in the example you provided, the two predictor variables you were using were not set as factors (i.e., using factor() or as.factor()), meaning they were inserted into the model as text. R silently converts them into factors when fitting the model, but reghelper just tests for whether the variable is set as type "factor" or not.

A lot of this just ends up being an issue of trying to figure out, after the fact, all the silent adjustments that R made in order to fit the model. So I'll do some work to fix this in the code. I should be able to test whether the variable is a factor or a string, and I can also convert characters that don't belong in variable names.

In the meantime, the following code would resolve the issue with the example you've provided:

minimal_example$Edad_fct <- gsub("-", "_", minimal_example$Edad_fct, fixed=TRUE)
minimal_example$Edad_fct <- gsub(" ", "_", minimal_example$Edad_fct, fixed=TRUE)
minimal_example$Genero <- factor(minimal_example$Genero)
minimal_example$Edad_fct <- factor(minimal_example$Edad_fct)

Basically, we're replacing the hyphens and spaces with underscores, and then explicitly setting the variables to factors. That should hopefully help you resolve the immediate issue so you can get on with your research -- but I will work on modifying the code to handle these cases and then put out a new version when it's resolved.

Thanks for letting me know of the problem! I appreciate it.

jeff-hughes commented 3 years ago

Issue is fixed in v1.0.2, which has been accepted to CRAN. Should be available there within the next day or so.