Question: non-normalised outside good's utility

LichaoChen331 commented 3 months ago

Hi Jeff,

Thank you for the fantastic package!

I have a question regarding the specification of the outside good's utility in the pyblp package. Is it possible to set the outside good's utility to be non-zero and a function of the existing data (even though I understand this is uncommon)? If so, how can this be achieved?

Additionally, does the functional form of the outside good's utility need to be pre-determined with all coefficients calibrated, or can we treat the coefficients in the outside good's utility as non-linear parameters and identify them by minimizing the GMM function?

Thank you for your help!

Best regards,

Louis

jeffgortmaker commented 3 months ago

Sure! Including a (random) coefficient on the constant term for inside goods is equivalent to including the negative of that same coefficient on an indicator for the outside good.

If this coefficient is non-random (i.e. just a parameter in beta), by default it'll be concentrated out with the other linear parameters. If it's random (i.e. you include parameters in sigma and/or pi on the constant), these are nonlinear and will need to be optimized over.

Does that make sense?

LichaoChen331 commented 3 months ago

Thanks, Jeff, for the prompt reply.

I just want to confirm that the equivalence of adding outside good's variables into the constant term of the inside goods should only work for logit or random coefficient logit models, correct? If it is a nested logit model (for example where all inside goods belong to one nest and the outside good belongs to another nest), I assume the mathematical equivalence would not hold?

In practice, should I include an extra term in the formulation that incorporates the variables affecting the outside good's utility in either a non-random way (included in X1) or a random way (included in X2)? Is that correct?

Thank you!

jeffgortmaker commented 3 months ago

I'm not sure why it wouldn't hold for the nested logit model. Choices are about differences in utility between alternatives. Adding a constant to the utility of all inside alternatives results in precisely the same choices as subtracting this constant from the utility of the outside alternative.

In practice, it's usually a good idea to at a minimum include a linear coefficient on the constant. Of course if you have any fixed effects these will already implicitly do so. If you have the data to do so, it's also usually a good idea to try to incorporate a random coefficient on the constant. This is particularly true if your counterfactuals of interest involve inside-outside substitution.

I recommend going through the exercises in the class I recently taught. By the end of them, you'll have included a random coefficient on the constant, gotten an idea for why this matters, and also gotten a sense for how variation in aggregate/micro data can be sufficient or insufficient to identify such a random coefficient.

LichaoChen331 commented 3 months ago

That's fantastic learning material. Thanks again, Jeff.

jeffgortmaker commented 3 months ago

For sure! I'll close this for now but hope this gets you started.

jeffgortmaker / pyblp

Question: non-normalised outside good's utility #163