jhelvy / logitr

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations in R
https://jhelvy.github.io/logitr/
Other
42 stars 15 forks source link

effects coding #46

Open KarinOudshoorn opened 1 year ago

KarinOudshoorn commented 1 year ago

When using logitr with effects coded categorical variables (contrast.sum) the transformation from the categorical variable to the dummy variables gives only one dummy column irrespective of how many categorical variables are in the model. To solve this, a check should be done on the contrast of the categorical variables and an alternative to fastDummies should be used.

jhelvy commented 1 year ago

I can't reproduce this issue. Here is an example.

{logitr} uses dummy coding by default for categorical variables. In this case, you get 3 brand coefficients as expected:

library(logitr)

model <- logitr(
  data = yogurt, outcome = 'choice', obsID = 'obsID',
  pars = c('price', 'feat', 'brand')
)

coef(model)
#>       price         feat  brandhiland  brandweight brandyoplait 
#>  -0.3665546    0.4914392   -3.7154773   -0.6411384    0.7345195 

Now if I use contr.sum to use effects coding, I still get 3 brand coefficient:

yogurt$brand <- as.factor(yogurt$brand)
contrasts(yogurt$brand) = contr.sum(4)

model <- logitr(
  data = yogurt, outcome = 'choice', obsID = 'obsID',
  pars = c('price', 'feat', 'brand')
)

coef(model)

#>     price       feat     brand1     brand2     brand3 
#> -0.3665883  0.4913432  0.9055508 -2.8100654  0.2643329 
KarinOudshoorn commented 1 year ago

Dear John, I should have been more precise. Indeed with the conditional logit model everything works fine. But it is with the mixed logit when you have categorical variables with a random distribution.

Best wishes,

Karin

From: John Helveston @.> Sent: Tuesday, July 18, 2023 4:14 PM To: jhelvy/logitr @.> Cc: Oudshoorn, Karin (UT-BMS) @.>; Author @.> Subject: Re: [jhelvy/logitr] effects coding (Issue #46)

I can't reproduce this issue. Here is an example.

{logitr} uses dummy coding by default for categorical variables. In this case, you get 3 brand coefficients as expected:

library(logitr)

model <- logitr(

data = yogurt, outcome = 'choice', obsID = 'obsID',

pars = c('price', 'feat', 'brand')

)

coef(model)

> price feat brandhiland brandweight brandyoplait

> -0.3665546 0.4914392 -3.7154773 -0.6411384 0.7345195

Now if I use contr.sum to use effects coding, I still get 3 brand coefficient:

yogurt$brand <- as.factor(yogurt$brand)

contrasts(yogurt$brand) = contr.sum(4)

model <- logitr(

data = yogurt, outcome = 'choice', obsID = 'obsID',

pars = c('price', 'feat', 'brand')

)

coef(model)

> price feat brand1 brand2 brand3

> -0.3665883 0.4913432 0.9055508 -2.8100654 0.2643329

- Reply to this email directly, view it on GitHubhttps://github.com/jhelvy/logitr/issues/46#issuecomment-1640310499, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGB6NZGXQHSMDZZSE45D7T3XQ2K3JANCNFSM6AAAAAA2OOEBZI. You are receiving this because you authored the thread.Message ID: @.**@.>>

jhelvy commented 1 year ago

Ah yes I see. Yes when I include randPars with effects coding, it appears to be ignored. I just get back the same prior model results where brand is modeled with fixed parameters:

library(logitr)

yogurt$brand <- as.factor(yogurt$brand)
contrasts(yogurt$brand) = contr.sum(4)

model <- logitr(
  data = yogurt, outcome = 'choice', obsID = 'obsID',
  pars = c('price', 'feat', 'brand'),
  randPars = c(brand = 'n')
)

coef(model)
#>     price       feat     brand1     brand2     brand3
#> -0.3665883  0.4913432  0.9055508 -2.8100654  0.2643329

I believe this is probably a pretty small issue in the code. It looks like it might be rooted in the names of the variables changing when using effects coding. I'll look into it.

I may also show this as an example in the documentation for those who want to use different coding schemes.

KarinOudshoorn commented 1 year ago

I get one sd being estimated, but the others not.

It would be great if that is added. Hereby I send you my code on my own (simulated) data (which I generated to use for a tutorial we are writing at the moment),

Best wishes,

Karin

From: John Helveston @.> Sent: Tuesday, July 18, 2023 4:30 PM To: jhelvy/logitr @.> Cc: Oudshoorn, Karin (UT-BMS) @.>; Author @.> Subject: Re: [jhelvy/logitr] effects coding (Issue #46)

Ah yes I see. Yes when I include randPars with effects coding, it appears to be ignored. I just get back the same prior model results where brand is modeled with fixed parameters:

library(logitr)

yogurt$brand <- as.factor(yogurt$brand)

contrasts(yogurt$brand) = contr.sum(4)

model <- logitr(

data = yogurt, outcome = 'choice', obsID = 'obsID',

pars = c('price', 'feat', 'brand'),

randPars = c(brand = 'n')

)

coef(model)

> price feat brand1 brand2 brand3

> -0.3665883 0.4913432 0.9055508 -2.8100654 0.2643329

I believe this is probably a pretty small issue in the code. It looks like it might be rooted in the names of the variables changing when using effects coding. I'll look into it.

I may also show this as an example in the documentation for those who want to use different coding schemes.

- Reply to this email directly, view it on GitHubhttps://github.com/jhelvy/logitr/issues/46#issuecomment-1640340036, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGB6NZEN2S54UFBDW2JIAU3XQ2MWDANCNFSM6AAAAAA2OOEBZI. You are receiving this because you authored the thread.Message ID: @.**@.>>