Setting RegressionTables methods and fields instead of StatsAPI ones

jsimons8 commented 5 months ago

For adapting my own package to play nice with RegressionTables, I notice that a formula is mandatory in the new version. However, a formula per se makes little sense because the data would not have columns of hidden factors

A workaround was to construct a formula and pass it along:

f = @formula(Incidence ~ Level)

That formula can be passed (in place of formula_schema). However, I already have a vector of parameter names which are not really column names but a different type of parameter not related to any data set but to the model itself. I suppose, I could convert a vector of strings to a formula. Is there a workaround somehow that is simple to use that avoids the formula object altogether?

junder873 commented 5 months ago

No, it is not necessary to have a formula object. It just needs _coefnames to be defined for your special type:

RegressionTables._coefnames(rr::MyModel) = RegressionTables.get_coefname($vector_of_strings_from_model)
# or
RegressionTables._coefnames(rr::MyModel) = string.($vector_of_strings_from_model)

# If you avoid the formula_schema object, then you would also probably want to define _responsename:
RegressionTables._responsename(rr::MyModel) = RegressionTables.get_coefname($lhs)
# or 
RegressionTables._responsename(rr::MyModel) = string($lhs)

Having @formula defined is useful for picking up interaction terms, but if that is not relevant in your situation, you do not need it.

See also #144

jsimons8 commented 5 months ago

Thank you for your response. That indeed does the trick.

I noticed the lines that did this previously have become obsolete?

StatsAPI.coefnames(m::MyModel) = m.coefnames;
StatsAPI.responsename(m::MyModel) = m.responsename

Does that mean more generally that the StatsAPI technology is obsolete and I should define RegressionTables methods instead? I found this page with references.

I guess there seem to be three different APIs I can think of:

StatsAPI
StatsModels
RegressionTables's own.

Is it correct that anything I can find in 1. should replaced by RegressionTables's own?

That theory would work based on the following example:

RegressionTables.BIC(m::MyModel) = m.BIC

produces a BIC reading in the table but

StatsAPI.bic(m::MyModel) = m.BIC

does not. So here I needed RegressionTables to have its own accessor and not StatsAPI's. Is there a way to know which one in advance? I have been going by trial and error. If there is more documentation I must read, then I am happy to do that. Another such example is the line StatsAPI.coef(m::Attempt.FixedVarianceLocalLevel) = m.coef still works although the reference RegressionTables._coef suggests StatsModels, which I thought was completely unrelated? I am sorry but I am a little confused here.

A similar issue obtains for calculating confidence intervals in a non-standard way.

I tried to mimic the FE model code here

function StatsAPI.confint(m::FixedEffectModel; level::Real = 0.95)
    scale = tdistinvcdf(StatsAPI.dof_residual(m), 1 - (1 - level) / 2)
    se = stderror(m)
    hcat(m.coef -  scale * se, m.coef + scale * se)
end

by defining a dummy interval

function StatsAPI.confint(m::MyModel; level::Real = 0.95)
    hcat([1],[3])
end

but it was not picked up and the standard normal approximation was used. I added the line:

RegressionTables.confint(m::MyModel; level) = StatsAPI.confint(m::MyModel; level)

but to no avail. Redefining like so

function RegressionTables.confint(m::MyModel; level::Real = 0.95)
    hcat([1],[3])
end

did not work, either. What am I doing wrong? I am sorry to be asking so many questions and thank you for your help!

junder873 commented 5 months ago

The goal of RegressionTables is that if you fully implement StatsAPI on your custom type, then RegressionTables should "just work". You point out at least one bug and an oversight in that, so thank you. There also might be a few exceptions to that, such as not relying directly on StatsAPI.coefnames, but there should be a good reason (allowing labels on individual parts of interaction terms and changing how interactions are displayed). Even for these exceptions, RegressionTables should ideally still rely on the some other API in some way (those functions rely on StatsModels.formula).

More generally, StatsAPI is the standard throughout Julia and allows interopability between packages. For example, if you had some other package that calculated some result on your custom regression model, it would probably rely on StatsAPI to do so. Then you would only need to implement one API and both the statistics package and RegressionTables would work. If you only want to implement RegressionTables, it is not as necessary to implement StatsAPI, but the goal is that you would just need to do it once.

The bug and the oversight:

Defining StatsAPI.bic should work; the RegressionTables BIC function is using StatsAPI.aaic for some reason there. I will fix this.
The default should be to use StatsAPI.confint, not create a custom implementation. I will need to make sure there was not some reason this was implemented differently. For now, you need to define ConfInt(rr::MyStatsModel, k::Int; level=0.95, standardize=false, vargs...) (line 507 of this file) to make confidence interval work.

Finally, there are a few places where the documentation refers to the StatsModels API. This is sort of fuzzy because the StatsModels API generally implements StatsAPI (with some exceptions, formula is a StatsModels item). I am happy to take suggestions on making the documentation clearer/more precise.

jsimons8 commented 5 months ago

Great, thank you so much! Once I get this done, I will write a little help file for people to use this based on my experiences and it would be good if that could be integrated into the documentation. It's a great package and it would be nice to see more statisticians use it.

Re: confidence intervals, I got them to work with this line:

function RegressionTables.ConfInt(m::Attempt.FixedVarianceLocalLevel, k::Int; level=0.95, standardize=false, vargs...)
    RegressionTables.ConfInt((1,3))
end

where the tuple for now is a placeholder of course.

junder873 commented 5 months ago

I definitely appreciate any help with the documentation!

junder873 commented 5 months ago

Both of those issues you pointed out should be fixed on the latest version.

jsimons8 commented 5 months ago

Ok, so I can use the confidence interval API

function StatsAPI.confint(m::MyModel; level::Real = 0.95)
    hcat([1],[3])
end

where the function must return a $k \times 2$ matrix? for $k$ the number of parameters?

StatsAPI.bic

for the Bayesian information criterion?

junder873 commented 5 months ago

On v0.7.5, it should (please let me know if it doesn't).

Just as a clarification though, the StatsAPI.confint is expecting a matrix with nrow = length(coef), so

function StatsAPI.confint(m::MyModel; level::Real=0.95)
    hcat(fill(1, length(coef(m)), fill(3, length(coef(m)))
end

Will not cause errors.

jsimons8 commented 5 months ago

I can confirm that using the StatsAPI.confint works now.

jmboehm / RegressionTables.jl

Setting RegressionTables methods and fields instead of StatsAPI ones #153