Open davidskalinder opened 9 months ago
can be considered!
i don't expect the object size to be a large issue here if the lm
and OaxacaBlinderDecomp
are cleverly stripped from overlapping data. in retrospect, it also helps debugging
i am not sure about how this relates to #5 - which is an issue that has to be solved regardless of including model fits, right?
Sounds good. I don't think this is a super-high priority for me just yet, but I guess if it becomes so I can try to code this in and send a PR?
cleverly stripped from overlapping data
What do you mean by this? (Though frankly, your point that object size might not be worth worrying about could be right regardless.)
i am not sure about how this relates to #5 - which is an issue that has to be solved regardless of including model fits, right?
Yes, in that issue the group sizes are calculated wrong because summary()
queries the original input data and handles the NA
s wrong. But if the lm
objects were in the OaxacaBlinderDecomp
object, as this issue suggests, then I think it'd be easy to fix #5 by just having summary()
query the model frame from the OaxacaBlinderDecomp
's lm
objects (in which the NA
s are already properly handled).
Sounds good. I don't think this is a super-high priority for me just yet, but I guess if it becomes so I can try to code this in and send a PR?
Sure!
cleverly stripped from overlapping data
What do you mean by this? (Though frankly, your point that object size might not be worth worrying about could be right regardless.)
Fitted models to be included in these cases are: model for group A, model for group B and model for group A + B. Each lm
object contains the source data. In this case, we could choose to remove the source data and only inlude one copy of the input dataset. Tbh, I don't expect this to be a big problem nonetheless. If you feel like starting such a feature, feel free to include the 3 models as-is including all data.
Yes, in that issue the group sizes are calculated wrong because
summary()
queries the original input data and handles theNA
s wrong. But if thelm
objects were in theOaxacaBlinderDecomp
object, as this issue suggests, then I think it'd be easy to fix #5 by just havingsummary()
query the model frame from theOaxacaBlinderDecomp
'slm
objects (in which theNA
s are already properly handled).
Noted, if models are included, the count can be extracted from the model objects.
Now depends #18 I believe.
As I mentioned in https://github.com/sinanpl/OaxacaBlinder/issues/5#issuecomment-1981561085, it might be a good idea to include the full output of each
lm()
call in theOaxacaBlinderDecomp
results object.Some pros and cons I can think of:
Pro
summary()
andcoef()
to query more stuff from the fitted modelslm()
the decisions about what info to include in results (aside from the stuff that's unique to Blinder-Oaxaca)broom
tidiersCon
OaxacaBlinderDecomp
objects bigger (unless the currentmeta$data
is removed)OaxacaBlinder
users might start to depend on the extra results, which could make it hard to take them away later if we change our mindsFWIW,
oaxaca::oaxaca()
's output does seem to include the full fit objects, although its results are pretty maximal in general so I'm not sure that's something to emulate.I'm on the fence about this one: I think I lean toward including the full
lm
fits eventually; but right now this only really affects #5, so maybe it's worth being more conservative and not including them until we have more actual use cases?5 does depend on this though, so @sinanpl let me know what you think.