Closed yihao-yuan closed 1 year ago
I think the main reason is that you didn't set a seed when setting up your Monte Carlo integration configuration, so the two problems are estimated with small sets of different simulation draws.
If you use integration=Integration('monte_carlo', 50, {'seed': 0})
you should get more similar results. When checking, I would also start by setting method='1s'
so that any small numerical differences from the first don't bubble up into the second step, and optimization=Optimization('trust-constr')
or some other optimizer that has better properties than the gradient-free simplex method that sometimes isn't that great.
Thank you! This works :)
Hi Jeff, I wonder if you could point me to the file that you use to construct weighting matrix and instrumental variables after concentrating out all the fixed effects? I'm asking because I’m curious about how the econometrics work here.
Thank you!
Sure! Fixed effects are absorbed into X1 and ZD here: https://github.com/jeffgortmaker/pyblp/blob/64e181716854ecf2d3e51e54c41b41afd7937f49/pyblp/economies/problem.py#L1531 Essentially, we just iteratively de-mean the two matrices. An initial 2SLS weighting matrix is computed here: https://github.com/jeffgortmaker/pyblp/blob/64e181716854ecf2d3e51e54c41b41afd7937f49/pyblp/economies/problem.py#L562 It's updated here: https://github.com/jeffgortmaker/pyblp/blob/64e181716854ecf2d3e51e54c41b41afd7937f49/pyblp/results/problem_results.py#L445 I'll also point you to pages 13-14 and Appendix A in our paper, where we discuss fixed effect absorption and concentrating out linear parameters.
Thank you for your quick response, Jeff! I really appreciate it.
After reading your codes, I am still a little confused by what are used to construct weighting matrices. I understand that you do within-transformation on X and delta's from your RAND paper, but how did you deal with Z? Did you demean Z as well?
In addition, I wonder if different methods of controlling for FEs, i.e., using within transformation vs. using dummies, can generate different estimates. For example, I think two-step GMM using optimal weighting matrix in the second step does not guarantee that the sun of xi's add up to zero. However, after within transformation, the transformed delta's and X's must be mean zero, such that xi's, which are linear combinations of delta's and X's, must be mean zero as well. However, using pyBLP package, I found that two methods generate the same GMM objectives and the same estimates by applying both methods to Nevo's fake cereal data. I am trying to understand what's going on here.
Thank you!
Yes, you need to de-mean instruments ZD as well. Up to the nonlinear parameters, we're just looking at a linear IV regression of delta on X1.
Adding dummies or de-meaning delta, X1, and ZD should be equivalent because you can apply the Frisch-Waugh-Lovell (FWL) theorem to the linear IV case. I recommend looking through different writeups of this, e.g. this one seems ok, but I don't have a best writeup to point to off the top of my head.
If you're used to Stata, you could also check out the absorb
option for ivreg2
or the ivreghdfe
package, which also support fixed effect absorption and updating the weighting matrix for linear IV.
Hi Jeff, Thank you again for your great support.
I understand that demeaning X1, ZD, and Y (delta's) should generate the same coefficient in linear IV case as using dummies. But I am thinking about if they could be different in GMM, using weighting matrix generated from ZD.
First of all, let me make sure that I understand your algorithm here. So you take X1, ZD, and recovered delta's (from guessed nonlinear parameters), demean all of them, and use demeaned ZD, $\tilde{ZD}$, to generate weighting matrix in the first step: $W = \tilde{ZD}' \tilde{ZD}$. In the second step, you use $\tilde{ZD}$ and $\xi$ from the first step to generate optimal weighting matrix. Then linear parameters are then estimated using $W$, $\tilde{ZD}$, and demeaned $X$ through IV-GMM. Is that correct?
Yes that all sounds right, with the optimal weighting matrix computed like here, depending on your choice of clustering, etc.
Thank you, Jeff. I used pyblp to run BLP on both car data using (i) dummies and (ii) within transformation to control for fixed effects. I controlled for the seed of random draws so that there is no numerical difference resulting from integration. The results below did show that the two specifications give quite different results. I think this could be because within-transformed instrumental variables $\tilde Z$ and instrumental variables with dummies $ZD$ generate different weighting matrix, especially when $\xi$'s are heteroscedastic.
Specification 1: Dummies
product_data = pd.read_csv(pyblp.data.BLP_PRODUCTS_LOCATION)
X1_formulation = pyblp.Formulation('1 + prices + C(market_ids)')
X2_formulation = pyblp.Formulation('0 + prices')
product_formulations = (X1_formulation, X2_formulation)
problem = pyblp.Problem(product_formulations, product_data, integration=pyblp.Integration('monte_carlo', 50, {'seed': 0}))
results = problem.solve(
sigma = np.diag([1]),
method = '2s',
optimization = pyblp.Optimization('nelder-mead', {'xatol': 1e-6}, compute_gradient=False)
)
results
Problem Results Summary:
===================================================================================
GMM Objective Gradient Clipped Weighting Matrix Covariance Matrix
Step Value Norm Hessian Shares Condition Number Condition Number
---- --------- --------- --------- ------- ---------------- -----------------
2 +4.18E-04 +1.25E-09 +1.02E-02 0 +3.02E+10 +1.06E+06
===================================================================================
Cumulative Statistics:
===========================================================================
Computation Optimizer Optimization Objective Fixed Point Contraction
Time Converged Iterations Evaluations Iterations Evaluations
----------- --------- ------------ ----------- ----------- -----------
00:00:19 Yes 83 179 46713 142783
===========================================================================
Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
===================
Sigma: prices
------ -----------
prices +1.24E-03
(+1.46E-01)
===================
Specification 2: within transformation
X1_formulation = pyblp.Formulation('1 + prices', absorb='C(market_ids)')
X2_formulation = pyblp.Formulation('0 + prices')
product_formulations = (X1_formulation, X2_formulation)
problem = pyblp.Problem(product_formulations, product_data, integration=pyblp.Integration('monte_carlo', 50, {'seed': 0}))
results = problem.solve(
sigma = np.diag([1]),
method = '2s',
optimization = pyblp.Optimization('nelder-mead', {'xatol': 1e-6}, compute_gradient=False)
)
results
Problem Results Summary:
===================================================================================
GMM Objective Gradient Clipped Weighting Matrix Covariance Matrix
Step Value Norm Hessian Shares Condition Number Condition Number
---- --------- --------- --------- ------- ---------------- -----------------
2 +9.28E+01 +8.89E-04 +3.22E+03 0 +3.94E+17 +4.29E+02
===================================================================================
Cumulative Statistics:
===========================================================================
Computation Optimizer Optimization Objective Fixed Point Contraction
Time Converged Iterations Evaluations Iterations Evaluations
----------- --------- ------------ ----------- ----------- -----------
00:03:11 Yes 57 113 629407 1890374
===========================================================================
Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
===================
Sigma: prices
------ -----------
prices +6.97E-03
(+7.42E-02)
===================
However, I ran the two specifications on the fake cereal data, and they generated the same estimates. This makes me wonder if the fake cereal data were generated by assuming $\xi$'s are i.i.d distributed, such that the weighting matrices are not very different using either method?
Maybe check what happens when you specify the same initial W
for both to verify this? Also see what first-stage results.first_results
look like?
I think what initial weighting matrix to use by default when fixed effects are absorbed was something that I'd thought about at some point, but I don't remember. I'd buy that different initial weighting matrices are giving you different results in-sample, particularly for a small dataset with somewhat weak instruments. Of course asymptotically the initial weighting matrix doesn't matter, but choosing a good one (not sure what a better one would be when absorbing fixed effects -- happy to hear other suggestions) can matter in practice.
One other thing, fwiw I don't recommend using nelder-mead -- with enough simulation draws and a smooth objective, gradient-based approaches with a tight gradient-based tolerance will be more reliable.
% Setup: x1: endogenous variables (the last variable is not included in z1); x2: month dummies; z1: IVs (not including dummies)
% Apply FWL theorem
>> x = [x1,x2]; z = [z1,x2];
>> m2 = eye(size(x2,1)) - x2*inv(x2'*x2)*x2'; % Projection matrix
>> z1tilde = m2*z1; x1tilde = m2*x1; ytilde = m2*y;
% FWL works well for OLS
>> b1 = (x1tilde'*x1tilde)\(x1tilde'*ytilde); b2 = (x'*x)\(x'*y); disp([b1,b2(1:end-12)]')
-0.5661 1.4649 0.4668 0.1623 0.0477
-0.5661 1.4649 0.4668 0.1623 0.0477
% FWL works well for 2SLS (GMM first step with inverse of Z’Z as weighting matrix)
>> omega1 = (z1tilde'*z1tilde)\eye(size(z1tilde,2)); omega = (z'*z)\eye(size(z,2));
>> b1 = (x1tilde'*z1tilde*omega1*z1tilde'*x1tilde)\(x1tilde'*z1tilde*omega1*z1tilde'*ytilde); b2 = (x'*z*omega*z'*x)\(x'*z*omega*z'*y); disp([b1,b2(1:end-12)]')
-0.5645 1.4655 0.4670 0.1629 0.0462
-0.5645 1.4655 0.4670 0.1629 0.0462
% The objective functions are also the same
>> e1 = ytilde - x1tilde*b1; e2 = y - x*b2; disp(max(abs(e1-e2)))
1.1857e-13
>> g1 = e1'*z1tilde./size(e1,1); f1 = g1*omega1*g1'.*size(e1,1); disp(f1)
0.0925
>> g2 = e2'*z./size(e2,1); f2 = g2*omega*g2'.*size(e2,1); disp(f2)
0.0925
% Optimal weighting matrix
>> gstar1 = e1.*z1tilde - mean(e1.*z1tilde,1);
>> gstar2 = e2.*z - mean(e2.*z,1);
>> w1 = (gstar1'*gstar1)\eye(size(gstar1,2)).*size(gstar1,1);
>> w2 = (gstar2'*gstar2)\eye(size(gstar2,2)).*size(gstar2,1);
>> b1 = (x1tilde'*z1tilde*w1*z1tilde'*x1tilde)\(x1tilde'*z1tilde*w1*z1tilde'*ytilde); b2 = (x'*z*w2*z'*x)\(x'*z*w2*z'*y); disp([b1,b2(1:end-12)]')
-0.2956 1.5169 0.4562 0.1830 0.0536
-0.2956 1.5169 0.4562 0.1830 0.0536
% The recovered xi’s are different, but the objective function values are the same
>> xi1 = ytilde - x1tilde*b1; xi2 = y - x*b2; disp(max(abs(xi1-xi2)));
0.0880
>> g1 = xi1'*z1tilde./size(e1,1); f1 = g1*w1*g1'.*size(e1,1); disp(f1)
1.5877e+03
>> g2 = xi2'*z./size(e2,1); f2 = g2*w2*g2'.*size(e2,1); disp(f2)
1.5877e+03
% FWL does not work for many other weighting matrix, e.g., identity matrix
>> b1 = (x1tilde'*z1tilde*z1tilde'*x1tilde)\(x1tilde'*z1tilde*z1tilde'*ytilde); b2 = (x'*z*z'*x)\(x'*z*z'*y); disp([b1,b2(1:end-12)]')
-0.3291 1.4774 0.3391 0.3929 -0.0281
-0.4305 1.4665 0.3996 0.2759 -0.0216
Thanks for doing this, this is useful, particularly if any one else gets confused by differences like this in the future! I'm not a matlab user, but one quick thing I noticed: should it be g1 = e1'*z1tilde./size(e1,1)
? Also residual differences on the order of 1e-5 seem quite large, if we expect dummies and de-meaning to be precisely equal (up to numerical error from inversions/solving linear systems) up to that point.
I think when discussing initial weight matrices, there's also a question about what FWL "working" means. Ideally, dummies and de-meaning would give precisely the same estimates (up to numerical error), but if that's just not possible for whatever reason, what I would be more focused on is the finite sample statistical properties of the two estimators.
Evaluating this would require some Monte Carlos, maybe with both well- and mis-specified DGPs if you're particularly interested in the latter case. Real data can be useful to get a sense of numerical differences, but not as much statistical ones. I just don't have the time to do this at the moment, but if it's something you're interested in doing, I'd be interested in the results. (I'm also optimistic that this is something someone in the literature has looked at, but haven't had a chance to look.)
Sorry for the confusion in the last comment. I made some mistakes in the coding, and actually the two methods (dummies vs. within transformation) should generate the same estimates in one- and two-step GMM, though not with other weighting matrices. I edited the codes I shared above to avoid any future confusions.
That being said, I am still not sure why the results are different when I use two different methods to control for year FEs using BLP's car data, as I shared above. The same problem does not appear when using fake cereal data though. I tried many different combinations of FEs, including market FEs, city FEs, quarter FEs, product FEs, and the two methods generate the same estimates in any of these cases. I'm not sure what's going on with my previous experiment on BLP's car data now.
In the end, I wonder if you could point me to the code you use to generate initial weighting matrix when micro moments are used? Thank you!
Ah okay, no worries. I'm also not sure, but please do share if you find out. You tried different FEs for the BLP problem too?
For micro moments, there did not (to me) seem to be a "canonical" initial weighting matrix, like the 2SLS one. So with micro moments, the default is initial_update=True
-- see its docs here for more. This just computes an estimate of the optimal weighting matrix at the initial parameter values. This isn't actually optimal unless these initial parameter values are consistent estimates of the true parameters, but I expect it to be more reasonable than, say, the identity matrix.
I'm going to close this for now, but feel free to re-open/keep commenting.
Thank you for this wonderful public good! I found some problems using the
absorb
option. Adding fixed effects as variables does not work the same as adding them in theabsorb
option if there are multiple fixed effects. Here is an example using the fake cereal data.The result from the first model is
From Model 2:
Is there a reason for this discrepancy? Thank you!