Micromoments for choices within individuals

jeffgortmaker / pyblp

BLP Demand Estimation with Python

https://pyblp.readthedocs.io

MIT License

228 stars 82 forks source link

Micromoments for choices within individuals #143

Closed kuriksha closed 11 months ago

kuriksha commented 12 months ago

Hi Jeff and Chris!

Thank you so much for all your work on PyBLP. I am currently trying to add micromoments to my BLP estimation, and I wanted to discuss my case here. I observe how product choices of particular consumers are correlated over time, and I would like to incorporate this information as additional moments in the estimation. This is similar in spirit to second choices and I believe it similarly should help with identification of \Sigma. However, formally this is different: the correct moment condition should specify relationships between s{i,t,j} and s{i,t',j'}, so I suspect that the framework from Conlon and Gortmaker (2023) may not be applicable here.

Additionally, based on the nature of my data, I believe it would be simpler to simulate choices and calculate moments from the simulated data, rather than using analytical formulas. I came across this older discussion, which mentions compute_custom and pyblp.CustomMoment. This seems to be what I need, but I couldn't find any further information on how it was implemented in past versions.

I would greatly appreciate it if you could share your thoughts or suggestions regarding my case. Should I consider using the older version of PyBLP where "pyblp.CustomMoment" has been available? Alternatively, maybe you could share some suggestions about how I should approach modifying the code to incorporate moments describing choices within individuals? Thank you in advance!

jeffgortmaker commented 11 months ago

So something like Chintagunta and Dubé (2005)? Right now PyBLP doesn't support anything time dimension-specific. Your best option at the moment is to either (1) fork and modify PyBLP yourself, or (2) implement your desired estimator from scratch. For (1), much of the relevant code is in micro.py, markets/market.py, and economies/problem.py. For (2), this might not be too much work if you essentially just have micro data and want to maximize a likelihood.

I agree that that incorporating information about a time dimension in the micro data would provide variation similar to second choices (i.e., variation in choice sets faced by the same consumer). You're also right that our micro framework is essentially for a static setting. To add a time dimension, you'd need to incorporate a product of probabilities over different time periods, much like the framework currently supports a product of first- and second-choice probabilities.

I've thought about adding a time dimension, but it's not on my to-do list right now for one main reason. My sense is that most models with a time dimension will also have switching costs. And models with switching costs typically need to address the initial conditions problem. Combined, this would be a lot of work to add features that I think are probably needed for a full implementation of (myopic) consumers.

Although I like implementing features in full, I also understand that some researchers may be okay with using a time dimension in the micro data for additional variation, without also incorporating switching costs. So also as you note, some type of custom moment might be a nice middle-ground that allows you to add something like this, while not requiring too much work on my end.

I wouldn't recommend using the old version of the package. The experimental CustomMoment you found in older package versions had some issues with statistical inference.

One idea is a new extra_moments argument to Problem.solve. It could accept a tuple (compute_moments, compute_gradient, compute_covariances) of functions that accept preliminary ProblemResults instances, and compute (1) custom moments, (2) their gradients with respect to the parameters, and (3) the covariances of all moments. Leaving (2) unspecified would mean to use finite differences, and leaving (3) unspecified would mean that computed standard errors and weighting matrices would be incorrect.

Do you have the sense you'd use something like this? I'm curious what your code would look like, given an interface like this for computing extra moments. My sense is that most users wouldn't specify (2) or (3), which would mean a pretty slow implementation that needs something like a bootstrap for inference, which wouldn't be great. But maybe this seems useful?

kuriksha commented 11 months ago

Hi Jeff,

Thank you so much for your detailed response.

To explain my goal a bit more, I have two different kinds of data. Dataset 1 allows me to observe prices and market shares for each $j$ and $t$, making it suitable for standard BLP. On the other hand, Dataset 2 allows me to occasionally observe choices within some consumers. These is reviews in a setting where people reviewed very frequently. However, the scraping of reviews was random, with small and fluctuating coverage over time.

In Dataset 2, I see some patterns, such as "people who buy $j$ also are likely to buy $k$ next time", which suggest particular substitutions and are central to the project. I want to incorporate these patterns by including product dummies as random coefficients and allowing for covariances between them in $\Sigma$. However, I am not sure if I can credibly estimate $\Sigma$ solely from Dataset 1.

There are two issues with Dataset 2. First, scraping was very unstable, so it is a very unbalanced panel, unlike Chintagunta and Dubé (2005). Second, although I observe time and consumer id, I do not observe the consumer's location. Because of this limitations, I still want to use BLP and estimate the price coefficient etc. from Dataset 1, while incorporating information from Dataset 2. I will consider choices for agents in all markets and construct micro moments that effectively capture the correlation between the number of purchases for different products within each consumer (while accounting for missing observations). Also, I indeed do not plan to model switching costs.

One idea is a new extra_moments argument to Problem.solve. It could accept a tuple (compute_moments, compute_gradient, compute_covariances) of functions that accept preliminary ProblemResults instances. Do you have the sense you'd use something like this? I'm curious what your code would look like, given an interface like this for computing extra moments. My sense is that most users wouldn't specify (2) or (3), which would mean a pretty slow implementation that needs something like a bootstrap for inference, which wouldn't be great. But maybe this seems useful?

Yes, this sounds like a promising approach, and I am definitely interested in giving it a try. Here is how I currently see it:

compute_moments: The exact analytic formula for my moments would be quite complex, because correlations between purchases should include sums of choice probabilities multiplied across $\approx 30$ time periods. I can either write it down and plug in $s_{ijt}(\theta)$ or simulate choices and obtain moments from simulated data. Do you think it would be more computationally efficient to do simulation in this case?
compute_gradient: Do you think it would it be reasonable to use finite differences (FD) for computing the gradient for extra_moments while using analytical derivatives elsewhere?
compute_covariances Eventually, I would add them too, but I would be fine with (somewhat) misspecified standard errors for the first step.

Please let me know your thoughts about this. Thank you!

jeffgortmaker commented 11 months ago

Great, based on what I understand about your setting, some type of custom moment interface would be helpful for you. Definitely if in your scraped data, people tend to choose the same type of alternative across different time periods as their choice set changes, this suggests that there could be a large "sigma" on a dummy for this type of alternatives (and not a large cost of switching away from this type, if you're ruling that out ex ante).

I'm not very familiar with all the problems you might get when you multiply 30 choice probabilities together, but you should take a look at the literature that estimates demand using ranked choice data (e.g., school choice) to see if they have recommendations. With 30 time periods, your expressions might blow up. My vague understanding is that simulation in theory can work, but in practice can be frustrating to get right. Probably, matching a small number of key statistics from the simulation might work better. And of course you'll want to use the same seed / random draws for each guess of theta.

I'll put something like the above custom moment API on my to-do list. Not sure when I'll get to it -- depends on how much code I'd need to change/add -- but I'll take a look in the next couple days. On gradient computation: yeah, finite differences for just these extra moments would be reasonable.

kuriksha commented 11 months ago

Thank you very much for your thoughts and taking a look at this! I really appreciate it.

jeffgortmaker commented 11 months ago

Hm, so I started to take a look and reminded myself how the relevant code is structured. Turns out it would require a good amount of refactoring to be creating a new ProblemResults within the code that creates the input into this instance itself. So I might leave this on the backburner unless I get some other people who also really need something like this, sorry.

However, I think we can still hack together something with the existing code, which I think should work pretty well for your purposes in the meantime. I think you should be able to define a custom optimization configuration that wraps whatever optimizer you're currently using (e.g. I like SciPy's trust-constr), but before passing the objective/gradient to the optimizer, adds your new moments' contributions to each.

Specifically, take a look at this tutorial on making a custom optimizer and let me know if this could work. Instead of a grid search, your function would do something like:

Define a function that wraps objective_function. This wrapper has theta as an argument, and: 1.1. Gets objective, gradient = objective_function(theta) 1.2. Calls Problem.solve at the current theta (you'll have to re-shape it into sigma, etc.) using method='1s', optimization=Optimization('return'), and check_optimality='gradient' to just get the bare-bones ProblemResults at the current theta. (You might be able to speed stuff up by using Simulation instead or rewriting some of, e.g., ProblemResults.compute_probabilities yourself, instead of recreating a new ProblemResults each time.) 1.3. Computes your new moments via simulation or some other method, including their gradient with finite differences if you want. 1.4. Adds your new moments' contributions to the objective and gradient, for example by taking their sum of squares. 1.5. Returns the modified objective and gradient in the manner expected by scipy.optimize.minimize.
Pass your objective function wrapper and the initial parameter values to scipy.optimize.minimize, returning its result.

Note that this is pretty much the same as what I was suggesting with the above custom moments. The big difference is that you create ProblemResults yourself, instead of PyBLP doing it for you. There are still issues with how to do inference, but it seems like you're not super concerned about this right now (maybe some type of bootstrap would be appropriate).

Does this make sense/seem like a reasonable thing to try out? If I do end up adding custom moment functionality like above, you'll have already written code that requires a ProblemResults instance, so it should be pretty easy for you to adapt, if you wanted to.

kuriksha commented 11 months ago

Yes, this sounds like a good plan, and thank you so much for your prompt response! I will start working on this. I will let you know if I succeed or get more questions.

jeffgortmaker commented 11 months ago

Sounds good! Looks like this might be a working solution, so I'm going to close this issue for now, but please do keep commenting when you make progress / have more questions!

kuriksha commented 11 months ago

Hi Jeff, I just want to post a quick update on this. The solution that you've suggested works well, thank you so much!

As for details, I was able to find a more tractable set of moments that have analytical expressions for values and gradients (the latter also use ProblemResults.xi_by_theta_jacobian). Recreating a new instance of ProblemResults each step takes much less time for me than calculation of micro moments, so the approach with a wrapper for objective_function doesn't really make things slower.

In simulated data, it looks like micro moments indeed help a lot with estimation of $\Sigma$.

Finally, in Conlon and Gortmaker (2023) you suggest using block-diagonal weighting matrix. I am also using some approximation for the optimal weighting matrix for micro moments, but I notice that I get substantially better results when I add more weight to micro moment contributions on top of that.

jeffgortmaker commented 11 months ago

Fantastic, glad it seems to be working!

One thing I forgot to mention is that a couple weeks ago, I added a new OptimizationProgress class, which contains additional information for custom optimization routines, like yours. It's passed along with the current objective and gradient. It's currently just in the dev code (so not a stable pip release), but it might help simplify your code a bit. At a minimum if you want to use the new dev code, you'd have to account for a third object in addition to the objective and gradient.

On the weighting matrix, if you're incorporating moments based only on micro data, the optimal weighting matrix should still be block diagonal, since standard and micro moments are asymptotically uncorrelated. Given unbiased estimates of standard and micro moments' covariance matrices, the optimal way to scale the micro block ends up just being the asymptotic ratio of the micro sample size to the aggregate data sample size. This will depend a lot on your asymptotic thought experiment.

But of course you may also view your micro moments as more credible sources of identification for sigma than aggregate sources of identification. (I think this is often the case, given good micro data.) In this case, it could make sense to weight them more than would be otherwise (statistically) optimal. I'm guessing that's what's going on in your case?

kuriksha commented 11 months ago

Thank you!

I will take a close look at how I can use OptimizationProgress. As for weighting, I only tried to work with simulated data for now and I definitely was more generous with micro data than with market data, so given your point it makes sense that scaling micro moments up was helping. And I hope real-data micro moments will be a more credible source of identification indeed.

I will post here if I get other questions or send you a link to the draft when it's ready :)