statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.21k stars 3.02k forks source link

add conditional logit model #941

Open josef-pkt opened 11 years ago

josef-pkt commented 11 years ago

Description

In contrast to MNLogit, here the variables differ by choice, but the parameters are the same across choices (at least for some .

special case: rectangular structure: all choices have the same variables, but different values

Interface

??? see also issue #940 for data handling in general multi-equation models

josef-pkt commented 11 years ago

@AnaMP we can open issues for models that are work in progress, so we can add comments before we have a pull request.

AnaMP commented 11 years ago

Related to both: data handling and new DCM, I'm keeping an eye on a extension of "formula" specific for models to describe the model to be estimated. It is a R package: A mFormula is a formula which the right hand side (exog part) may contain three parts separated by |: part 0 ~ part 1| part 2 | part 3. where: • part 0: endog part (e.g. choice) • part 1: xij, alternative specific variables with a generic coefficient β. (only unique coefficient for all the alternatives: conditional logit model) • part 2: zi, individual specific variables with an alternative specific coefficients γj. (only one coefficient for each alternatives (except one of them): multinomial logit model) • part 3: wij, alternative specific variables with an alternative specific coefficient δj.

choice ~ xij | zi | wij → Vij = αj + βxij + γj zi + δj wij

We can combine methods for do a matrix witch contains, one column coefficient for variables on first part, J − 1 columns for variables on second part and J columns for that on the third part.

And, since we want to be able to use a different specification for each alternative, combined it with a approach of issue #940, maybe (4) dictionary plus data, for deal with multiple equation models. Should I talk to Skipper Seabold about it?

josef-pkt commented 11 years ago

Best is to send formula questions and proposals to the mailing list and then file an issue with patsy. patsy has an open issue and some discussion on the pipe | Nathaniel and Skipper did all the formula work.

What is not clear yet to me is how the above parts 1 to 3 map into design matrices. For the formula handling we will need to know what design matrices should be constructed.

Should we get 3 separate design matrices or should the be combined? The second last paragraph it sounds like one matrix. We need a different design matrix for each choice?

josef-pkt commented 11 years ago

One possible way for going foward is to get the internal structure first, and then see how the formulas can be extended and used to map into this.

Inherit from LikelihoodModel or GenericLikelihoodModel to start with, but don't call super(...).__init__ yet if the arguments in the Model.__init__ don't follow the current pattern. And then figure out a permanent solution to #940 and formulas.

AnaMP commented 11 years ago

Ok, I finish the internal structure first and then I'll send a proposal to the mailing list. Anyway, I forget to add the link to the package: http://cran.r-project.org/web/packages/Formula/ We need only one matrix, something like that: choice ~ time | price | comfort (with car as a reference alternative)

2:(intercept) time 2:price 1:comfort 2:comfort
1.car 0 2.500000 0.00000 1 0
1.bus 1 2.500000 88.14840 0 1
2.car 0 2.500000 0.00000 1 0
2.bus 1 2.166667 70.51872 0 1
3.car 0 1.916667 0.00000 1 0
3.bus 1 1.916667 88.14840 0 0
josef-pkt commented 11 years ago

Ok, that doesn't sound too difficult. This is a long format, where we have several lines for each observation (1. 2. 3.), isn't it?

Are you working also with this kind of matrix internally, or do you reshape or split up?

josef-pkt commented 11 years ago

related: what if someone doesn't have a car?

AnaMP commented 11 years ago

Yes, it is, several lines for each observation. I use it only to get the data. I reshaping and spliting up it for calculations. If someone doesn't have a car there is a problem. We need to have, at least, two alternatives in order to have a choice set. We need comparing the utility of choosing that alternative to the utility of choosing other alternatives. Some analysts in order to use the observation with missing alternative, put on its variables values which do it not attractive. But, actually, it isn't the same that an alternative isn't available that it's very unappealing.

jbrockmendel commented 5 years ago

@kshedden does the existing implementation of conditional_models close this?

josef-pkt commented 5 years ago

I don't think so, this has questions related to discrete choice models which is an old GSOC PR