jeffgortmaker / pyblp

BLP Demand Estimation with Python
https://pyblp.readthedocs.io
MIT License
240 stars 83 forks source link

Guassian Quadrature Integration producing errors and erroneous results. #78

Closed kevindanray closed 3 years ago

kevindanray commented 3 years ago

I have been trying to use the quadrature rules for both my own project data and with the cereal pseudo-data (no agent data), and there are always problems.

With my project data, it basically crashes after one iteration. This may be explained by the issues observed with the cereal pseudo-data, as there seems to be many orders of magnitude in the difference between quadrature and QMC methods in the observable performance metrics.

For example, passing sigma = np.ones(4,4) for the cereal data problem gives us a first iteration as follows: MC: 2980 fixed point iterations, 9071 contraction evaluations, 6.5e03 objective value MLHS: 2955 fixed point iterations, 8986 contraction evaluations, 6.6e03 objective value Quad: 136437 fixed point iterations, 409201 contraction evaluations, 8.3e40 objective value

I have tested quadrature "sizes" ranging from 3 to 7, and the problem is persists.

Sometimes the quadrature method reports failure to converge, but other times it reports convergence to absurd objective values. For the cereal data, MC, MLHS, and Halton converge to approximately 1.7e02. Quadrature "converges" to 6.0e40 in spite of reporting the fixed point failing to converge and divide by zero errors at each of the 4 iterations.

chrisconlon commented 3 years ago

To be clear -- you are estimating the cereal data with:

a. a fully dense 4 x 4 matrix of random coefficients (10 parameters in the lower triangular Cholesky root). b. no demographic variation or demographic interaction coefficients. c. the same packaged instruments as in the original Nevo simulated cereal example.

My initial responses are:

  1. I am not sure there is any reason to believe that such a model is likely to be identified given the set of instruments. The cereal example performs better without the income-squared * price interaction than with it (the objective is pretty flat with the extra term). Adding all of those additional random coefficients seems like asking for trouble.
  2. One challenge with quadrature is that to get the most accurate value for the integral it oversamples points way into the tails of the distribution. This means you might evaluate a draw of an individual who is 6 SD's from the mean in all four dimensions. This individual has probability of being drawn at close to zero, and also a good chance of an overflow/underflow error.
  3. In general it is a good idea to prevent the random coefficients from evaluating very large values to prevent these kind of outliers by placing LB and UB on them.

-Chris

On Fri, Feb 12, 2021 at 2:29 PM kevindanray notifications@github.com wrote:

I have been trying to use the quadrature rules for both my own project data and with the cereal pseudo-data (no agent data), and there are always problems.

With my project data, it basically crashes after one iteration. This may be explained by the issues observed with the cereal pseudo-data, as there seems to be many orders of magnitude in the difference between quadrature and QMC methods in the observable performance metrics.

For example, passing sigma = np.ones(4,4) for the cereal data problem gives us a first iteration as follows: MC: 2980 fixed point iterations, 9071 contraction evaluations, 6.5e03 objective value MLHS: 2955 fixed point iterations, 8986 contraction evaluations, 6.6e03 objective value Quad: 136437 fixed point iterations, 409201 contraction evaluations, 8.3e40 objective value

I have tested quadrature "sizes" ranging from 3 to 7, and the problem is persists.

Sometimes the quadrature method reports failure to converge, but other times it reports convergence to absurd objective values. For the cereal data, MC, MLHS, and Halton converge to approximately 1.7e02. Quadrature "converges" to 6.0e40 in spite of reporting the fixed point failing to converge and divide by zero errors at each of the 4 iterations.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jeffgortmaker/pyblp/issues/78, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7IOWLFQIBOWXDYXKRLJKLS6V6TJANCNFSM4XRG3HVQ .

kevindanray commented 3 years ago

A, B, and C are all correct. I have confirmed that the method DOES work when the diagonal Cholesky root is used. I am using the L-BFGS-B method with the default bounds, so it could perhaps be improved by tighter bounds.

But to reiterate, when I try it on my proprietary data set, it takes about 10 minutes before the first iteration fails (delta fails to converge) which seems to crash the entire operation. It may just be that the grid method and the triangular Cholesky root are incompatible.

jeffgortmaker commented 3 years ago

Divide by zero errors in particular makes me think that Chris's point 2 might be going on. I'll see if I can reproduce your problem on the cereal data.

jeffgortmaker commented 3 years ago

Ah, I think you're having issues with sparse grid integration (SGI), not all quadrature? E.g. simple product rules work fine with the cereal data, but I can replicate your problem with SGI (our grid option).

Looks like at least for the cereal example that I can replicate, Chris is right about numerical issues. Specifically, the problem is that SGI often uses negative integration weights. This means that for "bad" parameter guesses (e.g. the initial ones here), we can get negative simulated market shares. This doesn't play nice with the BLP contraction, which requires log(shares), so our "solution" is to just clip shares from below, by default by 1e-300. I've found that in practice this can work well, both in some SGI problems I've seen, and more generally for dealing with underflow.

But it seems like your data/model (and the possibly non-identified cereal model with a dense sigma and few instruments) are counterexamples. The reason the contraction doesn't converge is because once we clip market shares from below, the contraction presumably ceases to be a contraction, and it may not converge (or may converge slowly -- who knows). Of course at the solution (mean utility that equates simulated and observed market shares), there is no clipping because observed market shares are all positive. But the problem is getting to that solution in the first place.

You have a few options, if you want to use quadrature:

  1. Use product rules instead. If you have something like > 4 random coefficients, this will likely be more computationally intensive. But weights are never negative.
  2. Try another type of quadrature that doesn't suffer from negative weights. E.g. this approach seems promising, and specifically targets the "negative weights issue," although I haven't tried it out myself. You can specify custom nodes and weights with your own agent_data.
  3. If you really want to stick with SGI for your problem, you can try out different fp_type options ('nonlinear' doesn't require log(shares), so it may be more robust to negative weights), different Iteration methods, or different delta/other parameter starting values. If you can get a configuration that makes the non-contraction converge, then all should be good.

Hope this helps. On our end, we should probably make error messages a bit more helpful here, since all you see is long-running code and a few division by zero messages. (And hopefully a nonzero "Clipped Shares" value in your optimization output, but that's admittedly nebulous without this explanation.)

jeffgortmaker commented 3 years ago

I'm going to close this for now, but feel free to re-open / keep commenting.