Hypothesis testing with a factor variable in pffr()

sealx017 commented 1 year ago

Hi,

I have a factor variable called "Group" as a predictor in the pffr() model. The factor variable has two labels: Group 1 and Group 2. I want to test the group effect but not sure how to interpret the summary output as shown below. I get two p-values corresponding to the two labels since the curves are estimated separately according to the documentation. I was expecting a dummy-variable type output with a single p-value that we get in standard lm. I tried fitting a model without the factor variable "Group" and then compare the two nested models using anova() but it did not return any p-value!

FAmodel <- pffr(Y ~ Group +  s(ID, bs="re"), data = wide_data)
summary(FAmodel)

Family: gaussian 
Link function: identity 

Formula:
Y ~ Group + s(ID, bs = "re")

Constant coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.29727    0.01289   23.06   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Smooth terms & functional coefficients:
                          edf Ref.df      F p-value    
Intercept(yindex)   8.782e+00      9 69.537  <2e-16 ***
GroupGroup1(yindex) 4.912e-04      4  0.000   0.562    
GroupGroup2(yindex) 2.026e+00      4  4.651  <2e-16 ***
s(ID)               9.276e+01    145  7.646  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.918   Deviance explained = 92.8%
-REML score = -596.04  Scale est. = 0.011345  n = 900 (90 x 10)

Any help will be greatly appreciated.

fabian-s commented 1 year ago

see #72 - pffr doesn't handle factor variables well, sorry. for a two-level factor, the easiest way to get what you want is to manually create the dummy variable and use that:

library(refund)
data(DTI)
DTI$sexMale <- 1 * (DTI$sex == "male")
pffr(rcst ~ sexMale, data = DTI[1:10, ]) |> summary()
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> rcst ~ sexMale
#> 
#> Constant coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.548898   0.004637   118.4   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth terms & functional coefficients:
#>                      edf Ref.df     F p-value    
#> Intercept(yindex) 15.302 19.000 51.53  <2e-16 ***
#> sexMale(yindex)    4.574  4.878 26.80  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.813   Deviance explained = 82.1%
#> -REML score = -784.09  Scale est. = 0.0021034  n = 499 (10 x 55)

^{Created on 2023-05-12 with reprex v2.0.2}

sealx017 commented 1 year ago

I see and thank you, Dr. Scheipl. Amazing works!

refunders / refund

Hypothesis testing with a factor variable in pffr() #105