lbittarello / Microeconometrics.jl

Microeconometric estimation in Julia
Other
30 stars 11 forks source link

Use the AbstractWeights system #2

Closed nalimilan closed 6 years ago

nalimilan commented 6 years ago

I've just discovered this package. Very interesting!

I have a suggestion regarding the handling of weights: it would make sense to use the AbstractWeights types defined in StatsBase rather than the custom normalize argument. That would help making the ecosystem consistent, and increase the clarity of the definition of weights. We've used the same terminology as Stata so that people can more easily find references about them.

lbittarello commented 6 years ago

Great idea!

I've been thinking about the best way to handle missing weight data. Microdata should maybe accept special keywords aweights, fweights and pweights, which take a string and create the corresponding weight vector from non-missing data in the DataFrame. Does it sound reasonable?

Would it be a good idea to make weight types a parameter? All estimation commands currently check if the Microdata has weights, like so:

r2(obj::Micromodel) = (checkweight(obj) ? _r2(obj, getvector(obj, :weight)) : _r2(obj))

Parametrizing Microdata would facilitate dispatch. I'm not sure which approach is more efficient.

nalimilan commented 6 years ago

It would probably be better to have a single weights argument, and choose the kind of weights depending on the kind of AbstractVector subtype you get. That way, people can choose the type once for all when creating a dataset, and don't need to repeat it.

Regarding dispatch, a possible trick is to use a UnitWeights pseudo-vector type internally, which would return 1 for all observations (see https://github.com/JuliaStats/StatsBase.jl/issues/135). That way you can handle the unweighted case just like the weighted cases, without any special code.

lbittarello commented 6 years ago

It would probably be better to have a single weights argument, and choose the kind of weights depending on the kind of AbstractVector subtype you get.

Such that a user would pass weights = fweights(DF[:weight)? What if there are missing observations in DF[:weight]?

nalimilan commented 6 years ago

IIRC, missing values are not allowed in weight vectors, they should be set to 0 instead. Do you know cases where it's legitimate to have missing weights?

lbittarello commented 6 years ago

As far as I understand, we should drop observations with missing weight data (or, equivalently, give them zero weight).

My point is: If the user must pass weights = fweights(DF[:weight]), they will have to check DF[:weight] and replace missing weights before creating the Microdata. If the user must pass fweights = "weight" or fweights = :weight, we can internally check DF[:weight] for them and drop offending observations before creating the ModelFrame.

nalimilan commented 6 years ago

If in your experience missing weights are so common, I guess we could allow them with *weights functions. Do you have cases in mind? In the kind of databases I use, weights are never missing.

lbittarello commented 6 years ago

I don't think that missing weights are common. We can leave it as it is.

I've updated the package with improved weight management based on StatsBase. I'll soon update the documentation.

nalimilan commented 6 years ago

Cool!