jeffgortmaker / pyblp

BLP Demand Estimation with Python
https://pyblp.readthedocs.io
MIT License
228 stars 82 forks source link

Issue with Wald test implementation #112

Closed kevindanray closed 2 years ago

kevindanray commented 2 years ago

I have been working to implement the Wald test, but I think there is an issue with how the Wald test is implemented. Specifically, the restrictions are implemented through a column vector, with the Jacobian needing to be "as many rows as the restrictions." While this is true for restrictions that are listed as row vectors, making the restrictions into a column vector then implies that the Jacobian has to be a square matrix. Reviewing textbooks/notes, this looks to be an issue, as the Jacobian should be free to have any shape.

This also sent me digging through the source code and the documentation, and I discovered one other issue of note; for the problem_results documentation, the parameter_covariances description says "Standard errors are the square root of the diagonal of this matrix divided by 𝑁". However, when in order to replicate the reported standard errors, I had to divide by root N rather than N.

jeffgortmaker commented 2 years ago

I'm not sure I understand the first issue. You can have as many restrictions as you like, but since the Jacobian is with respect to the theta, it must have as many columns as theta has elements. Maybe an example would help? There's a unit test that does a Wald test that all parameters are jointly zero here. If this doesn't make sense, it could help to link to the place that you're seeing something contradictory.

I think the second issue is just ambiguous language. You can either read it as "Standard errors are the square root of (the diagonal of this matrix divided by 𝑁)" or "Standard errors are the (square root of the diagonal of this matrix) divided by 𝑁". The former is correct, and the latter would be correct if N was replaced with root N.

kevindanray commented 2 years ago

I was using the "results3" from the tutorial as my base problem to learn how to use the Wald test. I noted that the beta coefficient for price is -30.2 with a standard error of 1.36, and wanted to use the Wald test and compare it to this t-test (t-stat of -22).

Since the restriction is a single linear restriction, the Jacobian should be the same as the restriction (although looking at the example you linked, I may be formulating the restrictions incorrectly), so I run the following:

restrictions = np.array([0,0,0,0,1])
results3.run_wald_test(restrictions, restrictions)

ValueError                                Traceback (most recent call last)
<ipython-input-15-bbfc0ae2666c> in <module>
----> 1 results3.run_wald_test(restrictions, restrictions)

~\Downloads\pyblp-master\pyblp-master\pyblp\results\problem_results.py in run_wald_test(self, restrictions, restrictions_jacobian)
    817             raise ValueError("restrictions must be a column vector.")
    818         if restrictions_jacobian.shape != (restrictions.shape[0], self.parameter_covariances.shape[0]):
--> 819             raise ValueError(
    820                 f"restrictions_jacobian must be a {restrictions.shape[0]} by {self.parameter_covariances.shape[0]} "
    821                 f"matrix."

ValueError: restrictions_jacobian must be a 5 by 5 matrix.

It looks like the error detection here is requiring the number of rows in the column vector of restrictions (which is equal to the number of parameters if I've formatted it correctly) to determine what size the Jacobian should be.

kevindanray commented 2 years ago

Oh, and thank you for the explanation of the variances and division by N, that makes sense.

kevindanray commented 2 years ago

Never mind, I finally figured out I was just not formatting the "restrictions" right. I've got it all working now.


# Compute Wald test
restrictions = results3.parameters[4]
restrictions_jacobian = np.array([[0,0,0,0,1]])
results3.run_wald_test(restrictions, restrictions_jacobian)

490.9819714661171

# Compute t-statistic and square for comparison
(results3.parameters[4]/((results3.parameter_covariances[4,4]/mc_problem.N)**0.5))**2

array([490.98197147])
jeffgortmaker commented 2 years ago

Oh great, glad you figured it out.