Problems with fixed effects

yihao-yuan commented 2 years ago

Thank you for this wonderful public good! I found some problems using the absorb option. Adding fixed effects as variables does not work the same as adding them in the absorb option if there are multiple fixed effects. Here is an example using the fake cereal data.

import pyblp
import numpy as np
import pandas as pd

pyblp.options.digits = 3
pyblp.options.verbose = False
pyblp.__version__

# Model 1
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
X1_formulation = pyblp.Formulation('0 + prices', absorb='C(product_ids) + C(city_ids)')
X2_formulation = pyblp.Formulation('1 + prices + sugar + mushy')
product_formulations = (X1_formulation, X2_formulation)
problem = pyblp.Problem(product_formulations, product_data, integration=pyblp.Integration('monte_carlo', size=50))
results = problem.solve(
    sigma = np.diag([1,1,1,1]),
    method = '2s',
    optimization = pyblp.Optimization('nelder-mead', {'xatol': 1e-6}, compute_gradient=False)
)
results

# Model 2
X1_formulation = pyblp.Formulation('0 + prices + C(product_ids)', absorb='C(city_ids)')
product_formulations = (X1_formulation, X2_formulation)
problem = pyblp.Problem(product_formulations, product_data, integration=pyblp.Integration('monte_carlo', size=50))
results = problem.solve(
    sigma = np.diag([1,1,1,1]),
    method = '2s',
    optimization = pyblp.Optimization('nelder-mead', {'xatol': 1e-6}, compute_gradient=False)
)
results

The result from the first model is

Problem Results Summary:
========================================================================================================
GMM   Objective  Gradient       Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm     Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  ---------  --------------  --------------  -------  ----------------  -----------------
 2    +7.64E+01  +1.72E-05    +1.67E-01       +2.57E+03        0        +5.59E+07          +3.76E+04    
========================================================================================================

Cumulative Statistics:
===========================================================================
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:03:44       Yes         457           804        514850       1610223  
===========================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
==========================================================
Sigma:       1         prices        sugar        mushy   
------  -----------  -----------  -----------  -----------
  1      -1.75E+00                                        
        (+6.34E-01)                                       

prices   +0.00E+00    +5.16E+00                           
                     (+5.56E+00)                          

sugar    +0.00E+00    +0.00E+00    -1.03E-01              
                                  (+3.59E-02)             

mushy    +0.00E+00    +0.00E+00    +0.00E+00    +4.64E-01 
                                               (+7.18E-01)
==========================================================

Beta Estimates (Robust SEs in Parentheses):
===========
  prices   
-----------
 -3.14E+01 
(+2.68E+00)
==========

From Model 2:

Problem Results Summary:
========================================================================================================
GMM   Objective  Gradient       Hessian         Hessian     Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm     Min Eigenvalue  Max Eigenvalue  Shares   Condition Number  Condition Number 
----  ---------  ---------  --------------  --------------  -------  ----------------  -----------------
 2    +7.06E+01  +9.33E-06    +2.29E-01       +1.14E+03        0        +1.27E+08          +8.10E+05    
========================================================================================================

Cumulative Statistics:
===========================================================================
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:03:40       Yes         383           672        461465       1439803  
===========================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
==========================================================
Sigma:       1         prices        sugar        mushy   
------  -----------  -----------  -----------  -----------
  1      -6.12E-01                                        
        (+7.20E-01)                                       

prices   +0.00E+00    +7.99E+00                           
                     (+3.51E+00)                          

sugar    +0.00E+00    +0.00E+00    +1.45E-01              
                                  (+4.35E-02)             

mushy    +0.00E+00    +0.00E+00    +0.00E+00    +3.82E-01 
                                               (+1.01E+00)
==========================================================

Beta Estimates (Robust SEs in Parentheses):
=====================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================
  prices     product_ids['F1B06']  product_ids['F1B07']  product_ids['F1B09']  product_ids['F1B11']  product_ids['F1B13']  product_ids['F1B17']  product_ids['F1B30']  product_ids['F1B45']  product_ids['F2B05']  product_ids['F2B08']  product_ids['F2B15']  product_ids['F2B16']  product_ids['F2B19']  product_ids['F2B26']  product_ids['F2B28']  product_ids['F2B40']  product_ids['F2B48']  product_ids['F3B06']  product_ids['F3B14']  product_ids['F4B02']  product_ids['F4B10']  product_ids['F4B12']  product_ids['F6B18']
-----------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------  --------------------
 -3.37E+01        +1.34E+00             +1.92E+00             +8.94E-01             +3.83E+00             +1.42E+00             +1.94E+00             +1.02E+00             +1.04E+00             +1.89E+00             +1.65E+00             -1.63E-01             +2.33E+00             +1.49E+00             +1.91E+00             +3.03E+00             +7.72E-01             +9.85E-01             +4.26E-01             +2.94E+00             +2.32E+00             +6.01E-01             +1.18E+00             +2.11E+00      
(+2.67E+00)      (+5.69E-01)           (+1.32E-01)           (+2.63E-01)           (+3.24E-01)           (+3.94E-01)           (+1.32E-01)           (+2.67E-01)           (+3.90E-01)           (+2.80E-01)           (+3.05E-01)           (+1.37E-01)           (+1.33E-01)           (+3.63E-01)           (+3.49E-01)           (+4.15E-01)           (+2.95E-01)           (+2.70E-01)           (+7.18E-01)           (+2.76E-01)           (+3.88E-01)           (+2.66E-01)           (+3.31E-01)           (+2.88E-01)     
=====================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================

Is there a reason for this discrepancy? Thank you!

jeffgortmaker commented 2 years ago

I think the main reason is that you didn't set a seed when setting up your Monte Carlo integration configuration, so the two problems are estimated with small sets of different simulation draws.

If you use integration=Integration('monte_carlo', 50, {'seed': 0}) you should get more similar results. When checking, I would also start by setting method='1s' so that any small numerical differences from the first don't bubble up into the second step, and optimization=Optimization('trust-constr') or some other optimizer that has better properties than the gradient-free simplex method that sometimes isn't that great.

yihao-yuan commented 2 years ago

Thank you! This works :)

yihao-yuan commented 1 year ago

Hi Jeff, I wonder if you could point me to the file that you use to construct weighting matrix and instrumental variables after concentrating out all the fixed effects? I'm asking because I’m curious about how the econometrics work here.

Thank you!

jeffgortmaker commented 1 year ago

Sure! Fixed effects are absorbed into X1 and ZD here: https://github.com/jeffgortmaker/pyblp/blob/64e181716854ecf2d3e51e54c41b41afd7937f49/pyblp/economies/problem.py#L1531 Essentially, we just iteratively de-mean the two matrices. An initial 2SLS weighting matrix is computed here: https://github.com/jeffgortmaker/pyblp/blob/64e181716854ecf2d3e51e54c41b41afd7937f49/pyblp/economies/problem.py#L562 It's updated here: https://github.com/jeffgortmaker/pyblp/blob/64e181716854ecf2d3e51e54c41b41afd7937f49/pyblp/results/problem_results.py#L445 I'll also point you to pages 13-14 and Appendix A in our paper, where we discuss fixed effect absorption and concentrating out linear parameters.

yihao-yuan commented 1 year ago

Thank you for your quick response, Jeff! I really appreciate it.

After reading your codes, I am still a little confused by what are used to construct weighting matrices. I understand that you do within-transformation on X and delta's from your RAND paper, but how did you deal with Z? Did you demean Z as well?

In addition, I wonder if different methods of controlling for FEs, i.e., using within transformation vs. using dummies, can generate different estimates. For example, I think two-step GMM using optimal weighting matrix in the second step does not guarantee that the sun of xi's add up to zero. However, after within transformation, the transformed delta's and X's must be mean zero, such that xi's, which are linear combinations of delta's and X's, must be mean zero as well. However, using pyBLP package, I found that two methods generate the same GMM objectives and the same estimates by applying both methods to Nevo's fake cereal data. I am trying to understand what's going on here.

Thank you!

jeffgortmaker commented 1 year ago

Yes, you need to de-mean instruments ZD as well. Up to the nonlinear parameters, we're just looking at a linear IV regression of delta on X1.

Adding dummies or de-meaning delta, X1, and ZD should be equivalent because you can apply the Frisch-Waugh-Lovell (FWL) theorem to the linear IV case. I recommend looking through different writeups of this, e.g. this one seems ok, but I don't have a best writeup to point to off the top of my head.

If you're used to Stata, you could also check out the absorb option for ivreg2 or the ivreghdfe package, which also support fixed effect absorption and updating the weighting matrix for linear IV.

yihao-yuan commented 1 year ago

Hi Jeff, Thank you again for your great support.

I understand that demeaning X1, ZD, and Y (delta's) should generate the same coefficient in linear IV case as using dummies. But I am thinking about if they could be different in GMM, using weighting matrix generated from ZD.

First of all, let me make sure that I understand your algorithm here. So you take X1, ZD, and recovered delta's (from guessed nonlinear parameters), demean all of them, and use demeaned ZD, $\tilde{ZD}$, to generate weighting matrix in the first step: $W = \tilde{ZD}' \tilde{ZD}$. In the second step, you use $\tilde{ZD}$ and $\xi$ from the first step to generate optimal weighting matrix. Then linear parameters are then estimated using $W$, $\tilde{ZD}$, and demeaned $X$ through IV-GMM. Is that correct?

jeffgortmaker commented 1 year ago

Yes that all sounds right, with the optimal weighting matrix computed like here, depending on your choice of clustering, etc.

yihao-yuan commented 1 year ago

Thank you, Jeff. I used pyblp to run BLP on both car data using (i) dummies and (ii) within transformation to control for fixed effects. I controlled for the seed of random draws so that there is no numerical difference resulting from integration. The results below did show that the two specifications give quite different results. I think this could be because within-transformed instrumental variables $\tilde Z$ and instrumental variables with dummies $ZD$ generate different weighting matrix, especially when $\xi$'s are heteroscedastic.

Specification 1: Dummies

product_data = pd.read_csv(pyblp.data.BLP_PRODUCTS_LOCATION)
X1_formulation = pyblp.Formulation('1 + prices + C(market_ids)')
X2_formulation = pyblp.Formulation('0 + prices')
product_formulations = (X1_formulation, X2_formulation)
problem = pyblp.Problem(product_formulations, product_data, integration=pyblp.Integration('monte_carlo', 50, {'seed': 0}))
results = problem.solve(
    sigma = np.diag([1]),
    method = '2s',
    optimization = pyblp.Optimization('nelder-mead', {'xatol': 1e-6}, compute_gradient=False)
)
results
Problem Results Summary:
===================================================================================
GMM   Objective  Gradient              Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm      Hessian   Shares   Condition Number  Condition Number 
----  ---------  ---------  ---------  -------  ----------------  -----------------
 2    +4.18E-04  +1.25E-09  +1.02E-02     0        +3.02E+10          +1.06E+06    
===================================================================================

Cumulative Statistics:
===========================================================================
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:19       Yes          83           179         46713       142783   
===========================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
===================
Sigma:    prices   
------  -----------
prices   +1.24E-03 
        (+1.46E-01)
===================

Specification 2: within transformation

X1_formulation = pyblp.Formulation('1 + prices', absorb='C(market_ids)')
X2_formulation = pyblp.Formulation('0 + prices')
product_formulations = (X1_formulation, X2_formulation)
problem = pyblp.Problem(product_formulations, product_data, integration=pyblp.Integration('monte_carlo', 50, {'seed': 0}))
results = problem.solve(
    sigma = np.diag([1]),
    method = '2s',
    optimization = pyblp.Optimization('nelder-mead', {'xatol': 1e-6}, compute_gradient=False)
)
results

Problem Results Summary:
===================================================================================
GMM   Objective  Gradient              Clipped  Weighting Matrix  Covariance Matrix
Step    Value      Norm      Hessian   Shares   Condition Number  Condition Number 
----  ---------  ---------  ---------  -------  ----------------  -----------------
 2    +9.28E+01  +8.89E-04  +3.22E+03     0        +3.94E+17          +4.29E+02    
===================================================================================

Cumulative Statistics:
===========================================================================
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:03:11       Yes          57           113        629407       1890374  
===========================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
===================
Sigma:    prices   
------  -----------
prices   +6.97E-03 
        (+7.42E-02)
===================

However, I ran the two specifications on the fake cereal data, and they generated the same estimates. This makes me wonder if the fake cereal data were generated by assuming $\xi$'s are i.i.d distributed, such that the weighting matrices are not very different using either method?

jeffgortmaker commented 1 year ago

Maybe check what happens when you specify the same initial W for both to verify this? Also see what first-stage results.first_results look like?

I think what initial weighting matrix to use by default when fixed effects are absorbed was something that I'd thought about at some point, but I don't remember. I'd buy that different initial weighting matrices are giving you different results in-sample, particularly for a small dataset with somewhat weak instruments. Of course asymptotically the initial weighting matrix doesn't matter, but choosing a good one (not sure what a better one would be when absorbing fixed effects -- happy to hear other suggestions) can matter in practice.

One other thing, fwiw I don't recommend using nelder-mead -- with enough simulation draws and a smooth objective, gradient-based approaches with a tight gradient-based tolerance will be more reliable.

yihao-yuan commented 1 year ago

% Setup: x1: endogenous variables (the last variable is not included in z1); x2: month dummies; z1: IVs (not including dummies)
% Apply FWL theorem
>> x = [x1,x2]; z = [z1,x2];
>> m2 = eye(size(x2,1)) - x2*inv(x2'*x2)*x2';       % Projection matrix
>> z1tilde = m2*z1; x1tilde = m2*x1; ytilde = m2*y;

% FWL works well for OLS
>> b1 = (x1tilde'*x1tilde)\(x1tilde'*ytilde); b2 = (x'*x)\(x'*y); disp([b1,b2(1:end-12)]')
   -0.5661    1.4649    0.4668    0.1623    0.0477
   -0.5661    1.4649    0.4668    0.1623    0.0477

% FWL works well for 2SLS (GMM first step with inverse of Z’Z as weighting matrix)
>> omega1 = (z1tilde'*z1tilde)\eye(size(z1tilde,2)); omega = (z'*z)\eye(size(z,2));
>> b1 = (x1tilde'*z1tilde*omega1*z1tilde'*x1tilde)\(x1tilde'*z1tilde*omega1*z1tilde'*ytilde); b2 = (x'*z*omega*z'*x)\(x'*z*omega*z'*y); disp([b1,b2(1:end-12)]')
   -0.5645    1.4655    0.4670    0.1629    0.0462
   -0.5645    1.4655    0.4670    0.1629    0.0462
% The objective functions are also the same
>> e1 = ytilde - x1tilde*b1; e2 = y - x*b2; disp(max(abs(e1-e2)))
   1.1857e-13
>> g1 = e1'*z1tilde./size(e1,1); f1 = g1*omega1*g1'.*size(e1,1); disp(f1)
    0.0925
>> g2 = e2'*z./size(e2,1); f2 = g2*omega*g2'.*size(e2,1); disp(f2)
    0.0925

% Optimal weighting matrix
>> gstar1 = e1.*z1tilde - mean(e1.*z1tilde,1);
>> gstar2 = e2.*z - mean(e2.*z,1);
>> w1 = (gstar1'*gstar1)\eye(size(gstar1,2)).*size(gstar1,1);
>> w2 = (gstar2'*gstar2)\eye(size(gstar2,2)).*size(gstar2,1);
>> b1 = (x1tilde'*z1tilde*w1*z1tilde'*x1tilde)\(x1tilde'*z1tilde*w1*z1tilde'*ytilde); b2 = (x'*z*w2*z'*x)\(x'*z*w2*z'*y); disp([b1,b2(1:end-12)]')
   -0.2956    1.5169    0.4562    0.1830    0.0536
   -0.2956    1.5169    0.4562    0.1830    0.0536
% The recovered xi’s are different, but the objective function values are the same
>> xi1 = ytilde - x1tilde*b1; xi2 = y - x*b2; disp(max(abs(xi1-xi2)));
    0.0880
>> g1 = xi1'*z1tilde./size(e1,1); f1 = g1*w1*g1'.*size(e1,1); disp(f1)
   1.5877e+03
>> g2 = xi2'*z./size(e2,1); f2 = g2*w2*g2'.*size(e2,1); disp(f2)
   1.5877e+03

% FWL does not work for many other weighting matrix, e.g., identity matrix
>> b1 = (x1tilde'*z1tilde*z1tilde'*x1tilde)\(x1tilde'*z1tilde*z1tilde'*ytilde); b2 = (x'*z*z'*x)\(x'*z*z'*y); disp([b1,b2(1:end-12)]')
   -0.3291    1.4774    0.3391    0.3929   -0.0281
   -0.4305    1.4665    0.3996    0.2759   -0.0216

jeffgortmaker commented 1 year ago

Thanks for doing this, this is useful, particularly if any one else gets confused by differences like this in the future! I'm not a matlab user, but one quick thing I noticed: should it be g1 = e1'*z1tilde./size(e1,1)? Also residual differences on the order of 1e-5 seem quite large, if we expect dummies and de-meaning to be precisely equal (up to numerical error from inversions/solving linear systems) up to that point.

I think when discussing initial weight matrices, there's also a question about what FWL "working" means. Ideally, dummies and de-meaning would give precisely the same estimates (up to numerical error), but if that's just not possible for whatever reason, what I would be more focused on is the finite sample statistical properties of the two estimators.

Evaluating this would require some Monte Carlos, maybe with both well- and mis-specified DGPs if you're particularly interested in the latter case. Real data can be useful to get a sense of numerical differences, but not as much statistical ones. I just don't have the time to do this at the moment, but if it's something you're interested in doing, I'd be interested in the results. (I'm also optimistic that this is something someone in the literature has looked at, but haven't had a chance to look.)

yihao-yuan commented 1 year ago

Sorry for the confusion in the last comment. I made some mistakes in the coding, and actually the two methods (dummies vs. within transformation) should generate the same estimates in one- and two-step GMM, though not with other weighting matrices. I edited the codes I shared above to avoid any future confusions.

That being said, I am still not sure why the results are different when I use two different methods to control for year FEs using BLP's car data, as I shared above. The same problem does not appear when using fake cereal data though. I tried many different combinations of FEs, including market FEs, city FEs, quarter FEs, product FEs, and the two methods generate the same estimates in any of these cases. I'm not sure what's going on with my previous experiment on BLP's car data now.

In the end, I wonder if you could point me to the code you use to generate initial weighting matrix when micro moments are used? Thank you!

jeffgortmaker commented 1 year ago

Ah okay, no worries. I'm also not sure, but please do share if you find out. You tried different FEs for the BLP problem too?

For micro moments, there did not (to me) seem to be a "canonical" initial weighting matrix, like the 2SLS one. So with micro moments, the default is initial_update=True -- see its docs here for more. This just computes an estimate of the optimal weighting matrix at the initial parameter values. This isn't actually optimal unless these initial parameter values are consistent estimates of the true parameters, but I expect it to be more reasonable than, say, the identity matrix.

jeffgortmaker commented 1 year ago

I'm going to close this for now, but feel free to re-open/keep commenting.

jeffgortmaker / pyblp

Problems with fixed effects #110