lbittarello / Microeconometrics.jl

Microeconometric estimation in Julia
Other
30 stars 11 forks source link

Possible Playmate #1

Closed Nosferican closed 5 years ago

Nosferican commented 6 years ago

I am developing UEM which should be registered soon. I found your package and I think we could talk about how to develop the Econometrics sub-ecosystem to have these packages play nice and help develop more tools.

lbittarello commented 6 years ago

I'd be happy to work together. There seems to be little overlap between the two packages at the moment, so we'd gain a lot from integrating them. I couldn't go through your code yet, so I don't know your approach. I'll try and have a look soon!

Nosferican commented 6 years ago

I have been thinking about how to design the ecosystem and I believe the best way would be through the second approach in JuliaLang/METADATA#10909. Basically develop it as Julia Stats did with DataFrames and wait for the package manager to allow for submodules à la Python. I have also brought the issue of developing for a common ecosystem in gragusa/CovarianceMatrices#19. Let me know once you have checked my approach.

lbittarello commented 6 years ago

I had a quick look at your code. We could easily bring our codes together. I don't know if you could check my code, so I'll summarize my approach, since I haven't written any documentation for developers.

The user must first process the data, declaring which variables will perform each role in the estimation. Unlike glm or your uem, the user cannot directly feed a formula to the estimator. This separation seems awkward, but it has advantages: 1) Econometric models are often complicated. For instance, IV regression requires the specification of the response, the treatment, the instrument and the controls. This setup doesn't fit into a Formula. The user could input several formulas (which is also awkward), but we'd have to check that all of them drop the same rows if there are missing data. Pooled arrays could be a headache. Microdata processes all elements at once, so that the resulting design matrix is consistent. 2) If the user wants to fit several models, repeated processing of the data wastes time. This difference is usually negligible, but it can matter if the sample is large or if the user bootstraps.

As part of this preparation, the user must specify the correlation structure of the sample (e.g., heteroscedastic or clustered). Depending on the model, this step can save a lot of time. It also allows greater flexibility, which is useful in the case of two-stage models. We could figure out a way to make it work with CovarianceMatrices.

Point estimation is standard. We both separate the estimation of point coefficients, the hessian and the matrix of score vectors. I don't estimate statistics (R², etc.) though. The user can call the standard functions of StatsBase to obtain them. Most are only available for MLE.

We should probably settle on a common framework for data processing and inference. We could wrap the relevant functions into submodules. We could have a big submodule for estimators or separate submodules for each family (cross-section, panel, etc.).

Nosferican commented 6 years ago

I believe @matthieugomez FixedEffects could also port its covariance estimators. GLM could keep working like all the other packages if they implement the StatsBase methods for StatisticalModel/RegressionModel.

As for DataFrames/StatsModels Formula. I use two formulas (response ~ exogenous) and (endogenous ~ additional instruments). Seems to be pretty easy. Especially for method dispatch.

I am at the gym now so will make more comments later, but for covariance estimates I separate these from the model such that users may estimate it with multiple options while not having to make / re-fit the model. Same is done to cache results such as in 2SLS or Random Effects which needs to fit multiple models first.

For CovarianceMatrices, using StatsBase methods should be enough (additional required ones can be requested from StatsBase). If it can use the multi-way clustering algorithm with its variants we should be good (non-clustered is just a especial case of each observation being its own cluster).

I might request StatsBase to provide the Wald statistic for Regression models which must packages seem to be implementing too.

Nosferican commented 6 years ago

See @dmbates MixedModels for another alternative to the standard formulae solution.

lbittarello commented 6 years ago

Consider your solution. Suppose that the instrument is missing for observation i of the sample. There are no other missing data. ModelFrame will create a complete model matrix for your first formula, but it will drop observation i for the purposes of your second formula. You'd have to keep track of missing rows across formulas. In certain scenarios, dropping missing data might reduce the number of levels of pooled arrays, creating additional difficulties. Imagine if a model requires three or four formulas...

At first sight, CovarianceMatrices and Microeconometrics seem to use the same approach. Certain names change (e.g., bread vs. jacobian), but disruption should be small. I do think that computing the weight matrix in advance is advantageous though.

Nosferican commented 6 years ago

ModelFrame uses completecases!(df[allvars(fm)]). Before I run ModelFrame I just do

df = df[:, union([PID, TID], allvars(fm), allvars(fm)]
completecases!(df)

As for caching useful values... The most computational intensive is the design matrix inversion. That I cache in the model as bread

bread = inv(cholfact(X' * X))
β = bread * X' * y

I similarly do for 2SLS... Can be done for hat-values for leveraged adjusted residuals.

lbittarello commented 6 years ago

Isn't it wasteful to create a new dataframe? What if it's very large? And what if you have a variable number of formulas?

Nosferican commented 6 years ago

ModelFrame already copies the DataFrame. Making a copy with only the variables you need actually is more efficient than calling ModelFrame with the raw data frame (ModelFrame, keeps a copy of the whole DataFrame which includes all the unnecessary variables by default). I am not familiar with a design with variable number of formulas... For example, in my package is either (one) for exogenous models, and (two) for instrumental variables. Modifying the multipart formula is basically the same as these are split into multiple formulas for creating the model matrix.

lbittarello commented 6 years ago

Actually, it only makes a copy of the relevant variables (df, msng = na_omit(DataFrame(map(x -> d[x], trms.eterms)))). (I thought it only kept a reference to the original DataFrame.)

Microeconometrics separates data processing from estimation. The processor can handle the variable requirements of different estimators. Otherwise, each estimator will need its own code to process the data, leading to redundancies and inconsistencies.

Nosferican commented 6 years ago

All linear panel models solve the same problem and use the same solver. You just feed it the transformed arguments. Is the same design used by Stata, SAS, EViews, R {plm}, Python linearmodels, mine, etc. For nonlinear estimators you just add a suitable solver, but same process applies. The core of the implementation is the transformations not the solvers per se.

lbittarello commented 6 years ago

I don't understand your point.

I don't like the two-formula solution. I prefer the current setup of Microeconometrics: Microdata processes the data into a structure which any estimator will understand.

For OLS, e.g., you'd write: data = Microdata(df, response = "y", control = "1 + x"). For the within estimator, you could declare: data = Microdata(df, response = "y", control = "1 + x", time = "t", id = "i"). Maybe you could even distinguish time-varying from time-invariant controls, so you could use the same Microdata to fit both the pooled and the within estimators. When you want to fit the model, you simply use fit(Within, data) or whatever.

We could wrap all these functionalities into a dedicated submodule...

Nosferican commented 6 years ago

For an instrumental variables would be

data = Microdata(df::DataFrames.DataFrame,
    response::String,
    control::String,
    treatment::String,
    instrument::String)
fit(estimator, data)

rather than

model = fun(estimator::Symbol,
    fm::DataFrames.Formula,
    iv::DataFrames.Formula,
    df::DataFrames.DataFrame;
    PID::Symbol = names(df)[1],
    TID::Symbol = names(df)[2],
    ...)

while I prefer the formulae language even with its basic lhs ~ rhs format the biggest disadvantage is that it does not allow for code-reuse, best practices, etc. Julia Data and Julia Stats have put a lot of work in developing the DataFrames / StatsModels implementations for it. While it is still under development, future releases will likely include more advanced features such as polynomial terms which are vital for marginal effects in glm such as logit / probit and other functions such as multicollinearity tests. It also helps to have a standard common interface rather than each package rely on its own DataFrame to ModelMatrix objects.

For example, how do you pass a contrast specification using the strings (e.g., basic functionality as deciding the order of levels / base level for categorical variables)?

lbittarello commented 6 years ago

Microdata converts the input into a formula. So it exploits all the existing code. It adds flexibility though. It isn't very different from your uem, but it's much more general and accommodates other estimators. Note that the syntax is similar to that of Gadfly.

I haven't implemented the input of contrasts yet, but it's easy to do. (You don't need it to choose the order of levels or the base level of categorical variables though: you can control them via PooledDataArray.)

nalimilan commented 6 years ago

@Nosferican is right that the main interest of using formulas is to benefit automatically from the support for any generic table source. The idea behind StatsModels.jl is that modeling packages do not have to deal with the construction of model matrices from formulas and data. StatsModels is supposed to handle that for them, which means that any new custom data source will automatically work, not just DataFrame. Also eventually the goal is to avoid making any unnecessary copies of the data. If the current interface isn't flexible enough for you, just file an issue to discuss possible improvements.

lbittarello commented 6 years ago

@Nosferican, any hopes of better integrating our packages?

I've updated the documentation and included a guide for new estimators. It could maybe serve as a starting point. I think we're creating a lot of redundancy – between CovarianceMatrices, you and me, we have three estimators of clustered standard errors...

Nosferican commented 6 years ago

I have been pushing for a JuliaEconometrics organization à la JuliaData or JuliaStats. We could migrate those packages or a version that works with each of those. The idea is to make very light small packages (Kernels, CovarianceMatrices, EconUtils, etc.). Then packages can call what they need. If that seems like a good solution we could migrate the packages there. I haven't checked to see how the Machine Learning packages could work together (maybe depend on the Regression stuff?). For GLM I would like to get the family/link as a separate package as well. I would like to get it all worked during early Julia 0.7 if possible (my current work for my dissertation is in variance covariance estimators so I plan on implement the estimator in Julia).

First step would be for example, get the CovarianceMatrices and maybe EconUtils to work with the various packages and bit by bit start depending on a common infrastructure. For playing together inherit from StatsBase.RegressionModel and use StatsModels.Formula and DataFrames.AbstractDataFrame for building a model. Additional API stuff can be worked out afterwards.

EconUtils currently uses EconFormula(formula::Formula) and parses with the following structure:

response ~ exogenous + (endogenous ~ instruments) + (absorb = fixedeffects) + (cluster = vars)

if that is something that works for Microeconometrics we could start migrating it to EconUtils... Should require minimal effort for your project. CovarianceMatrices can be made available to any RegressionModel and Microeconometrics outsource that as well. The JuliaEconometrics/VarianceCovarianceEstimators is a bit different than CovarianceMatrices in the sense that is meant for other packages to call it rather than the users. Just need to add HAC and an option for Spatial HAC to finish it up.

If you wanna try out migrating to EconUtils let me know and I can help with anything. If there is any functionality not in EconUtils that you use or would like we could add it there.

lbittarello commented 6 years ago

I have reservations about Formula. I don't think it's flexible enough. I dislike the implicit linear representation too. Moreover, the pseudo keywords make the syntax inconsistent. Every time a model needs a new variable set (e.g., to special regressors), we'd need a new parser. I haven't heard a good argument against the approach of Microdata, which is flexible and consistent and doesn't suggest linearity.

I have put in quite a bit of effort into covariance estimation. It can handle different weight types and arbitrary correlation structures. (By the way, it already supports SHAC and two-way clustering,) It avoids loops (even for clustering), which improves running time. Moreover, it supports two-stage estimators, allowing to mix and match models. (Stata is much more limited.) I'd be happy to migrate it so that others have access to it. Like your own package, it's meant for internal use. Because it's modular, it should be easy to separate from the rest of the package.

One advantage of separating the correlation structure from the design matrix, as Microeconometrics does, is that one can reuse it. When it's expensive to compute (e.g., SHAC or even clustered variances), precomputation saves a lot of time. It should also be easy to spin the utilities for CorrStructure into a separate package.

A separate package for kernels would be nice – something along the lines of MLKernels. EconKernels, maybe?

Nosferican commented 6 years ago

Check out the new iteration of StatsModels. Formula has become a lot more flexible compared to before. The EconFormula parses a Formula and stores various Formula that can be used for ModelFrame. This could potentially just be a Dict. I don't follow the linear representation argument.

That's great. I implemented the variance covariance estimators in the most general way multi-way cluster robust with the outer product of the effective residuals and Hadamard product with the weighting matrix; it doesn't use any for loops either. The weighting matrix can use any weights by kernel, cluster robust indicator or SHAC so it should be easy to port it. You can check it out here. It provides the same results as reghdfe in Stata including suppressing singletons when used with EconUtils. For SHAC, I though one needed a spatial package to calculate the spatial distance (for example in R geosphere package). I felt it was better to just allow for people to provide the distance matrix. For temporal HAC, I calculate the distance matrix beforehand and provide it for the vcov.

EconUtils calculates the Clusters[Dimension[Level[Observations]]] which makes it very easy to use for building the weighting matrix.

I try to cache the results and re use them. VarianceCovarianceEstimators use

modelmatrix(obj)
residuals(obj)

it also has

function bread(obj::RegressionModel)
    mm = modelmatrix(obj)
    output = inv(cholfact!(mm.'mm))
    return output
end

which packages can override if they cache the inverse of the normal equation,

bread(obj::MyStruct) = getfield(obj, :Bread)

for example.

For weights, I believe it is a good practice to have the modelmatrix return the weighted model matrix, but if there is some argument against it it could be worked out.

Check out the code and let me know.

Nosferican commented 6 years ago

As for the instrumental or two-stage estimators, the approach I took is to have it generalized so OLS is just a special case of 2SLS where z = zeros(m,0) and Z = zeros(m,0).

lbittarello commented 6 years ago

I don't follow the linear representation argument.

Consider IV. You write: y ~ x + (t ~ x). There's an implicit linear representation here: y = αt + x'β and t = z'γ. It isn't always appropriate. Consider Abadie (2002): you need to regress z on x, construct weights and regress y on t with those weights. You'll need a different formula to convey this procedure, which is silly, as both models use the same components (y, x, t, z). Look at Stata's teffects: it's a mess.

it doesn't use any for loops either

I think you loop twice: once to construct the cluster vector and then to construct the correlation matrix. This second step is repeated every time you estimate the variance. It's better to directly construct and store the correlation matrix in sparse symmetric form, which is equivalent to your first loop.

a spatial package to calculate the spatial distance

I implemented one version of it. We can spin it off.

residuals(obj)

But the variance of nonlinear models doesn't need the residuals... It needs the score vector.

it is a good practice to have the modelmatrix return the weighted model matrix

Frequency weights need special treatment.

lbittarello commented 6 years ago

Why don't you check the code for variance estimation in Microeconometrics and let me know what you think? the two main files this and this.

nalimilan commented 6 years ago

Consider IV. You write: y ~ x + (t ~ x). There's an implicit linear representation here: y = αt + x'β and t = z'γ. It isn't always appropriate. Consider Abadie (2002): you need to regress z on x, construct weights and regress y on t with those weights. You'll need a different formula to convey this procedure, which is silly, as both models use the same components (y, x, t, z). Look at Stata's teffects: it's a mess.

To be clear: what solution to you use/suggest for this?

Overall I find the Microdata idea interesting, though I would have given it a more general name since it's not just about data (and "microdata" is a term specific to a disciplinary field), it includes information about the model specification (and that's indeed the name of a section of the manual). It would be really great if we could find a unified solution which could be implemented in StatsModels, so that we don't have different interfaces for e.g. econometric vs. other fields. In general I think we can get a better result if we join forces across all disciplines, as we often use the same kinds of models but with slightly different perspectives and terminologies, but we have a limited work capacity.

lbittarello commented 6 years ago

what solution to you use/suggest for this?

The user declares model components with keywords (response = y, control = x, etc.), which doesn't impose any relationship between the components. So IV and Abadie need the same input, even though their internals differ. And that input is consistent: instead of y ~ x + (weights = w), you have response = y, control = x, weights = w.

Other approaches would work too. The user could pass a tuple of components in a conventional order (e.g., (y, x, t, z)) and keywords for weights and the like. I suppose it wouldn't be too different from the more flexible implementation of Formula, but it would eliminate implicit structural assumptions and it would be easier to standardize the input across models.

I would have given it a more general name

You're right. I wasn't imaginative enough. Suggestions are welcome. :)

we can get a better result if we join forces across all disciplines

Yes! I'd be happy to work on a more general framework.

nalimilan commented 6 years ago

OK, so that's more or less what we discussed at https://github.com/JuliaStats/StatsModels.jl/issues/21? The main difference being that you used strings for formulas, while I and @kleinschmidt suggested using a macro like @model so that formulas can be written directly. Using keyword arguments sounds indeed clearer than relying on their position.

It would be interesting if you could come up with a possible design similar to Microdata, but using macros to extract formulas (and symbols for single variables), that we could integrate in StatsModels. We should probably continue to discuss this at https://github.com/JuliaStats/StatsModels.jl/issues/21.

Nosferican commented 6 years ago

The linear representation should be flexible enough. For example, models that rather than use instruments use the projection of a subset of exogenous variables can be specified with

response ~ exogenous + (endogenous ~ 0)

The same syntax can mean different procedures depending on the keyword arguments. For example which estimator to use. However, if in fact there are some models which really cannot be fitted in that framework we can adapt it to a more flexible approach.

As for the variance covariance estimator. The for loop is used in the residuals for leverage values of sub-matrices à la McCaffrey and Bell (2006). The for loop is used to generate the weights for HAC (basically gets the values based on distance and kernel). For the actual computation of the filling there is no for loop. In practice it actually is used as

fit!(model::RegressionModel)
    #some code
    V = vcov(model)
    setfield!(model, :vcov, V)
    ...
end
vcov(model) = getfield(model, :vcov)

so it doesn't need to be computed every time, just once.

It would be great to the spatial distance package separate!

I having implemented vcov for non-linear models. We might need to implement some different methods in StatsBase first (you could open an issue or PR there to get that going), but should be made available eventually.

See if you can improve a bit the StatsBase weights section to make it good enough to use it for our needs. Frequency weights could be a keyword argument to enable WLS.

lbittarello commented 6 years ago

you can improve a bit the StatsBase weights section to make it good enough to use it for our needs

They are already good enough.

lbittarello commented 6 years ago

Here's a proposition for the way forward:

Nosferican commented 6 years ago

I will play around with the Formulae to capture all formulae specific with keyword arguments and see if that comes out nicely. Would probably just parse them to something like

mutable struct EconFormula
    Representation::String
    Formulae::Dict{Symbol,Formula}
    # constructor
end

Packages could get their model calling something like,

df, varlist, assign, y, X, z, Z, D, G = modelframe(formula::EconFormula, df::AbstractDataFrame;
                                                             contrast::Dict{Symbol, Dict{Symbol,Contrasts}} = Dict{Symbol, Dict{Symbol,Contrasts}}())

For variance covariance estimators,

I prefer to have them all in the general form so all models would be WLS (OLS is special case with weights = 1) and Two-stages (one-stage is a special case with no endogenous variables for example). Let's work first on linear models and then tackle generalized linear models (mlogit, ologit, poisson, negative binomial, etc.)

lbittarello commented 6 years ago

I will play around with the Formulae to capture all formulae specific with keyword arguments

But I just said I would do that. :/

EconFormula

We were just talking about neutral terminology...

distance matrix

I think it's better to loop, so that we can exploit symmetry. It saves time for large datasets. Using Distances seems to be a good idea.

we should accept the weights

We should accept weights. Again: different weights require different treatment.

Let's work first on linear models and then tackle generalized linear models

But I'm already working on nonlinear models... And I have laid out all the framework...

lbittarello commented 6 years ago

Are you okay with spinning off CorrStructures? Covariance estimation should require little extra work, given all our existing code.

Nosferican commented 6 years ago

I meant for my packages, but if that is the preferred solution at StatsModels level (regression models rather than Econometrics models) then even better.

EconFormula would be the approach for now unless the framework works for Regression models then it can be at a higher level. I don’t want to impose our solutions to fields that have other needs (MixedModels for example).

Looping or mapping is the same. Both can exploit the symmetry.

Is not a question of accepting weights or not. Most models can be WLS (a few don’t generalize though). The question is whether to re calculate the weighted object every time or just apply the weights once and cache it.

What I mean for linear models is to port it in stages. Test it, get it stable and then add new features. Sandwich estimators are not valid for many nonlinear estimators. Those may require different implementations (delta method, for the marginal effects -need to get coefficients to marginal effects and correctly handle interactions which-, bootstrapping, etc.) I would prefer to have sandwich estimators first and then add other methods. For nonlinear models which use the a sandwich estimators it would be fine to include them at this stage.

lbittarello commented 6 years ago

Most models can be WLS (a few don’t generalize though)

Really? Even nonlinear and two-stage models?

to re calculate the weighted object every time or just apply the weights once and cache it

Good question. I think that an object isn't worth cacheing if it's only used twice (estimation of β and V, say).

Sandwich estimators are not valid for many nonlinear estimators

Sandwich estimators hold for all GMM estimators – i.e. pretty much every parametric estimator in econometrics (and many semiparametric estimators at that). In its most general form, the variance is ψ Ω ψ', where ψ is the influence function and Ω is the correlation structure. In the case of GMM, ψ = J⁻¹ s, where J is the Jacobian of the score vector and s is the score vector. Microeconometrics exploits this general structure to build a flexible variance estimator. Developers only need to properly build the score and the Jacobian. In some cases, automation helps this step too: the score of two-stage estimators is a simple function of their constituent parts. The delta method ultimately falls into this framework too. The CorrStructure provides Ω.

marginal effects

You're right that marginal effects will require some thought.

Nosferican commented 6 years ago

For non-linear models or estimators I had in mind something like Lasso or Net-elastic (Ridge is good) for which the variance covariance doesn't have a closed form solution. For Generalized linear models is the same framework and obtaining Ω is a matter of choosing weights for n (n + 1) / 2 parameters to get the most efficient, yet consistent estimator (using sample statistics like residuals for the error terms / or unbiased estimators using those; under homoscedasticity also an unbiased estimator for the diagonal) and a combination of a distance matrix and kernel for handling autocorrelation.

Do you envision something like:

function CoolPkg.vcov(model::StatsBase.RegressionModel)
    dist = distance(model)
    knl = kernel(model)
    et = samplestats(model)
    HAC = hac(dist, knl)
    bread = bread(model)
    γ = fsa(model, HAC)
    output = γ * bread * (et * et.' .* HAC) * bread.'
end

The model <: StatsBase.RegressionModel would have to define a few accessor for the package to get the chosen estimator, information for distance matrix, which kernel to use (data and optimizer or chosen solution), and the bread.

lbittarello commented 6 years ago

choosing weights for n (n + 1) / 2 parameters to get the most efficient, yet consistent estimator

Are you talking about GLS? I'm thinking about sampling weights!

Do you envision something like

It's all been implemented. Please read my code:

Nosferican commented 6 years ago

I read the current design. My question is related to how packages would interact. For example,

function jacobian(obj::OLS, w::UnitWeights)
    x = getmatrix(obj, :control)
    return x' * x
end

must be defined at the package or overloaded. My question refers to establishing those by relaying in the inheritance. For example, rather than

x = getmatrix(obj, :control)
x = StatsBase.modelmatrix(obj::StatsBase.RegressionModel)

that way the API can be used by any package that implements the StatsBase.RegressionModel methods instead of having to overload them to specify score and Jacobian. If so, best to get those methods to StatsBase first so they can be called when specifying it in the package.

Btw, if you are swinging by ASSA next month we can catch up in person and discuss the progress.

lbittarello commented 6 years ago

We can transfer the utilities to other packages. StatsModels could have the equivalent of getvector and co. once we define the generalization of Formula. If we write a package for covariance estimation, it could define score and jacobian. Packages could define new methods for each model. What do you think?

if you are swinging by ASSA next month we can catch up in person and discuss the progress

I'm afraid I won't make it, but I'd be glad to talk in some other way. :)

Nosferican commented 6 years ago
 Are you talking about GLS? I'm thinking about sampling weights!

I am referring to HAC (including CRVE) which gives the weights that determine the trade-off between efficiency and consistency for the variance covariance estimator. Can send you a manuscript I am working on with that framework.

From the TODO:

Nosferican commented 6 years ago
Sandwich estimators hold for all GMM estimators – i.e. pretty much every parametric
estimator in econometrics (and many semiparametric estimators at that).

Here is an argument in favor of not allowing robust sandwich estimators for nonlinear models.

gragusa commented 6 years ago

Unfortunately these arguments are terribly wrong.

On Sat, Dec 16, 2017 at 6:12 AM José Bayoán Santiago Calderón (史志鼎) < notifications@github.com> wrote:

Sandwich estimators hold for all GMM estimators – i.e. pretty much every parametric estimator in econometrics (and many semiparametric estimators at that).

Here https://davegiles.blogspot.com/2013/05/robust-standard-errors-for-nonlinear.html is an argument in favor of not allowing robust sandwich estimators for nonlinear models.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lbittarello/Microeconometrics.jl/issues/1#issuecomment-352161978, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfRgn4JUTzY_O-M14oYa0hllfAlDf_Zks5tA1FYgaJpZM4PCOud .

-- Giuseppe Ragusa

Nosferican commented 6 years ago

"Some packages also compute Huber-White standard errors as an option for probit and logit analysis, using the sandwich form (13.71). While the robust variance matrix is consistent, using it in equation (15.20) means we must think that the binary response model is incorrectly specified. Unlike with nonlinear regression, in a binary response model it is not possible to correctly specify E(y|x) but to misspecified Var(y|x). Once we have specified Pr(y = 1| x), we have specified all conditional moments of y given x. Nevertheless, as we discussed in Section 13.11, it may be prudent to act as if all models are merely approximations to the truth, in which case inference should be based on the sandwich estimator in equation (13.71). (The sandwich estimator that uses the expected Hessian, as in equation (12.49), makes no sense in the binary response context because the expected Hessian cannot be computed without assumption (15.8).)"

Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data (MIT Press) (Page 569). The MIT Press. Kindle Edition.

Here is a Stata post on why offer the robust variance covariance estimators for nonlinear models.

Another issue would be the "semi-robust" for the case of a GLM which doesn't use a canonical link (i.e., EIM and OIM differ).

gragusa commented 6 years ago

That is in fact correct

But the blog post is a sum of wrong statements. A students would not pass his first year phd econometric course if he had written those stuff.

The key is the conditional mean. If the conditional mean is correctly specified MLE than MLE is consistent. If everything else is also correctly specified (variance, third moments, etc meaning you are using the right pdf in the likelihood) then by matrix information inequality the sandwich (the sandwich is the hessian inverse times squared gradient times hessian inverse) reduces to the squared gradient of the loglikelihhod.

Probit/logit are particular because the only parameter of a Bernoulli is the success probability which is the conditional probability. So logit is either right or wrong. Non middle ground. But this is specific to the Bernoulli/multinomial.

On Sat, Dec 16, 2017 at 8:43 AM José Bayoán Santiago Calderón (史志鼎) < notifications@github.com> wrote:

"Some packages also compute Huber-White standard errors as an option for probit and logit analysis, using the sandwich form (13.71). While the robust variance matrix is consistent, using it in equation (15.20) means we must think that the binary response model is incorrectly specified. Unlike with nonlinear regression, in a binary response model it is not possible to correctly specify E(y|x) but to misspecified Var(y|x). Once we have specified Pr(y = 1| x), we have specified all conditional moments of y given x. Nevertheless, as we discussed in Section 13.11, it may be prudent to act as if all models are merely approximations to the truth, in which case inference should be based on the sandwich estimator in equation (13.71). (The sandwich estimator that uses the expected Hessian, as in equation (12.49), makes no sense in the binary response context because the expected Hessian cannot be computed without assumption (15.8).)"

Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data (MIT Press) (Page 569). The MIT Press. Kindle Edition.

Another issue would be the "semi-robust" for the case of a GLM which doesn't use a canonical link (i.e., EIM and OIM differ).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lbittarello/Microeconometrics.jl/issues/1#issuecomment-352167595, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfRgh7zXZmbjHHeqyLQYtDajShjIdZFks5tA3SQgaJpZM4PCOud .

-- Giuseppe Ragusa

Nosferican commented 6 years ago

For models for which it makes sense such as Poisson (using canonical link) I think the estimator is valid and should be offered. For the case of generalized linear models using the multinomial distribution or probit, I think is best to offer only the spherical error estimator.

lbittarello commented 6 years ago

The post makes an interesting point, but it's ultimately misleading in oversimplifying theory.

As @gragusa points out, the sandwich form is valid whenever the conditional mean is correct. More generally, it is valid whenever the moment conditions are correct.

The linear exponential family is a case in point. The moment conditions hold even if the density is incorrect. This family includes such distributions as the Normal, the Poisson and the Bernouilli. (Robust standard errors are perfectly fine for Logit and Probit! As a matter of fact, a simplification occurs and the White estimator has the same limit as the outer product of gradients or the information matrix.) See 5.7.3 in Cameron and Trivedi (2005).

The sandwich form is also valid for nonlinear least squares and GMM. Between LEF, NLLS and GMM, we have the bulk of econometric estimators.

I don't think we should enforce restrictions on which estimators take which variances. We can't possibly check all possible consistency conditions for each model. Moreover, we've adopted a modular approach, allowing users to mix and match. It's up to the user to know what they're doing. Let their reviewers point out if they make a mistake.

lbittarello commented 6 years ago

Follow-up:

As a matter of fact, sandwich standard errors are valid for all MLE!

The sandwich variance is just Hessian⁻¹ × OPG × Hessian⁻¹. If the distribution is correctly specified, Hessian⁻¹ = OPG, so the formula simplifies.

So there's really no argument for restricting the combinations of estimators and variances. If you need robust standard errors, your model may be incorrect and point estimates will be inconsistent – but that's a different problem.

gragusa commented 6 years ago

The sandwich version is the asymptotic variance —- it does simplify under correct specification —- that is white’s information inequality result.

On Sat, Dec 16, 2017 at 1:25 PM lbittarello notifications@github.com wrote:

Follow-up:

As a matter of fact, sandwich standard errors are valid for all MLE!

The sandwich variance is just Hessian⁻¹ × OPG × Hessian⁻¹. If the distribution is correctly specified, Hessian⁻¹ = OPG, so the formula simplifies.

So there's really no argument for restricting the combinations of estimators and variances. If you need robust standard errors, your model may be incorrect and point estimates will be inconsistent – but that's a different problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lbittarello/Microeconometrics.jl/issues/1#issuecomment-352180357, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfRgmkHaGrkBM6MRBoc7ojodoUujxvBks5tA7bRgaJpZM4PCOud .

-- Giuseppe Ragusa