First, a huge thanks to the wonderful work that you have created! I have two questions once applying the code:
I understand that usually nested logit model will give a higher own price elasticity compare to the simple logit model (in absolute values) and the closer the value of the rho to 1, the bigger the price elasticity. However, I usually receive unreasonably large price elasticity with the nested logit model. For example, in the first tutorial of Nevo's example, the logit model returns the price elasticity of -3.1 and the nested logit model where mushy becomes the nest returns the price elasticity of -8.5 (using compute_elasticity). And this becomes more severe in the automobile example (without micro moments). The blp model returns a reasonable price elasticity but with a RCNL model where air becomes a nest, I get -45 for price elasticity with rho = 0.95 by using the same starting points. I noticed that in the literature, you do usually get larger price elasticity in the nested logit model compare to the simple logit model (Grigolon and Verboven (2014) for example). However, it seems that the difference in the literature are more moderate compare to exercise provided in the tutorial. Is it due to the way that Pyblp calculate the price elasticity? I do not have the time to go to the details of the code to check whether the closed form solution for the price elasticity is specified differently for the simple logit model and the nested logit model since the analytical solutions for the market share is different. Or is there any other points that I am missing here for calculating the elasticity?
About the optimizer converged sign in the problemresult. I have noticed that in the tutorial, sometimes the optimizer is converged and sometime it is not. For example, in the automobile example (without micro moments), the optimizer is not converged. I am not sure whether it is still counted as an acceptable results given that it shows on the tutorial. In my own practice, I nearly always get a converged optimizer when using scipy.optimize ('slsqp'), have a half half chance for convergence with 'l-bfgs-b' depends really on the starting values, and rarly observed a Yes sign when using the 'bfgs' method. Is it because of the way to determine the convergence is different for these methods? for slsqp since it uses an approximate quadratic form it is easier to converge and for 'bfgs' it is a bit harder. Is the result is acceptable even if you did not observe a Yes sign and the projected gradient norm is relatively small (but how small precisely?) ? What is the termination methodology for returning a result since even if you receive a No sign you still get the result (but it usually takes a longer time).
Thank you very much for your help and I am looking forward for the reply!
I'm a bit unsure what you're asking. Different models will be more appropriate for different settings. If the nested logit model isn't particularly appropriate (i.e., is particularly misspecified) for the empirical examples we use in our tutorials (certainly, the original authors decided not to use it?), then it's difficult for me to evaluate whether differences are "moderate" or not. If you think there's a bug with how elasticities are being computed, a minimum working example (e.g., comparing with finite differences) would be useful.
What optimizer is most appropriate will differ by the problem. Some optimizers are better at smooth objectives; others, choppy ones. And how you configure the optimizer is just as important -- setting a tight gradient-based tolerance is usually a good idea. Personally, I find that SciPy's trust-constr algorithm does a pretty good job across BLP-type problems (of course, you'll still need to configure its tolerances to fit your problem). In general, the "convergence" flag isn't particularly useful. What you should be looking for is that across multiple (i.e., 3-5) random starting values within reasonable bounds, you consistently get the same (up to numerical error) estimates. Near the end of optimization, you should see "marching down the gradient" behavior, where the objective is getting smaller, and the gradient norm is consistently approaching zero.
Hi Jeff and Christ,
First, a huge thanks to the wonderful work that you have created! I have two questions once applying the code:
I understand that usually nested logit model will give a higher own price elasticity compare to the simple logit model (in absolute values) and the closer the value of the rho to 1, the bigger the price elasticity. However, I usually receive unreasonably large price elasticity with the nested logit model. For example, in the first tutorial of Nevo's example, the logit model returns the price elasticity of -3.1 and the nested logit model where mushy becomes the nest returns the price elasticity of -8.5 (using compute_elasticity). And this becomes more severe in the automobile example (without micro moments). The blp model returns a reasonable price elasticity but with a RCNL model where air becomes a nest, I get -45 for price elasticity with rho = 0.95 by using the same starting points. I noticed that in the literature, you do usually get larger price elasticity in the nested logit model compare to the simple logit model (Grigolon and Verboven (2014) for example). However, it seems that the difference in the literature are more moderate compare to exercise provided in the tutorial. Is it due to the way that Pyblp calculate the price elasticity? I do not have the time to go to the details of the code to check whether the closed form solution for the price elasticity is specified differently for the simple logit model and the nested logit model since the analytical solutions for the market share is different. Or is there any other points that I am missing here for calculating the elasticity?
About the optimizer converged sign in the problemresult. I have noticed that in the tutorial, sometimes the optimizer is converged and sometime it is not. For example, in the automobile example (without micro moments), the optimizer is not converged. I am not sure whether it is still counted as an acceptable results given that it shows on the tutorial. In my own practice, I nearly always get a converged optimizer when using scipy.optimize ('slsqp'), have a half half chance for convergence with 'l-bfgs-b' depends really on the starting values, and rarly observed a Yes sign when using the 'bfgs' method. Is it because of the way to determine the convergence is different for these methods? for slsqp since it uses an approximate quadratic form it is easier to converge and for 'bfgs' it is a bit harder. Is the result is acceptable even if you did not observe a Yes sign and the projected gradient norm is relatively small (but how small precisely?) ? What is the termination methodology for returning a result since even if you receive a No sign you still get the result (but it usually takes a longer time).
Thank you very much for your help and I am looking forward for the reply!
Best,
Lichao