Open jsimons8 opened 5 months ago
No, it is not necessary to have a formula object. It just needs _coefnames
to be defined for your special type:
RegressionTables._coefnames(rr::MyModel) = RegressionTables.get_coefname($vector_of_strings_from_model)
# or
RegressionTables._coefnames(rr::MyModel) = string.($vector_of_strings_from_model)
# If you avoid the formula_schema object, then you would also probably want to define _responsename:
RegressionTables._responsename(rr::MyModel) = RegressionTables.get_coefname($lhs)
# or
RegressionTables._responsename(rr::MyModel) = string($lhs)
Having @formula
defined is useful for picking up interaction terms, but if that is not relevant in your situation, you do not need it.
See also #144
Thank you for your response. That indeed does the trick.
I noticed the lines that did this previously have become obsolete?
StatsAPI.coefnames(m::MyModel) = m.coefnames;
StatsAPI.responsename(m::MyModel) = m.responsename
Does that mean more generally that the StatsAPI
technology is obsolete and I should define RegressionTables
methods instead? I found this page with references.
I guess there seem to be three different APIs I can think of:
Is it correct that anything I can find in 1. should replaced by RegressionTables's own?
That theory would work based on the following example:
RegressionTables.BIC(m::MyModel) = m.BIC
produces a BIC reading in the table but
StatsAPI.bic(m::MyModel) = m.BIC
does not. So here I needed RegressionTables
to have its own accessor and not StatsAPI
's. Is there a way to know which one in advance? I have been going by trial and error. If there is more documentation I must read, then I am happy to do that. Another such example is the line StatsAPI.coef(m::Attempt.FixedVarianceLocalLevel) = m.coef
still works although the reference RegressionTables._coef suggests StatsModels
, which I thought was completely unrelated? I am sorry but I am a little confused here.
A similar issue obtains for calculating confidence intervals in a non-standard way.
I tried to mimic the FE model code here
function StatsAPI.confint(m::FixedEffectModel; level::Real = 0.95)
scale = tdistinvcdf(StatsAPI.dof_residual(m), 1 - (1 - level) / 2)
se = stderror(m)
hcat(m.coef - scale * se, m.coef + scale * se)
end
by defining a dummy interval
function StatsAPI.confint(m::MyModel; level::Real = 0.95)
hcat([1],[3])
end
but it was not picked up and the standard normal approximation was used. I added the line:
RegressionTables.confint(m::MyModel; level) = StatsAPI.confint(m::MyModel; level)
but to no avail. Redefining like so
function RegressionTables.confint(m::MyModel; level::Real = 0.95)
hcat([1],[3])
end
did not work, either. What am I doing wrong? I am sorry to be asking so many questions and thank you for your help!
The goal of RegressionTables is that if you fully implement StatsAPI on your custom type, then RegressionTables should "just work". You point out at least one bug and an oversight in that, so thank you. There also might be a few exceptions to that, such as not relying directly on StatsAPI.coefnames
, but there should be a good reason (allowing labels on individual parts of interaction terms and changing how interactions are displayed). Even for these exceptions, RegressionTables should ideally still rely on the some other API in some way (those functions rely on StatsModels.formula
).
More generally, StatsAPI is the standard throughout Julia and allows interopability between packages. For example, if you had some other package that calculated some result on your custom regression model, it would probably rely on StatsAPI to do so. Then you would only need to implement one API and both the statistics package and RegressionTables would work. If you only want to implement RegressionTables, it is not as necessary to implement StatsAPI, but the goal is that you would just need to do it once.
The bug and the oversight:
StatsAPI.bic
should work; the RegressionTables BIC
function is using StatsAPI.aaic
for some reason there. I will fix this.StatsAPI.confint
, not create a custom implementation. I will need to make sure there was not some reason this was implemented differently. For now, you need to define ConfInt(rr::MyStatsModel, k::Int; level=0.95, standardize=false, vargs...)
(line 507 of this file) to make confidence interval work.Finally, there are a few places where the documentation refers to the StatsModels API. This is sort of fuzzy because the StatsModels API generally implements StatsAPI (with some exceptions, formula
is a StatsModels item). I am happy to take suggestions on making the documentation clearer/more precise.
Great, thank you so much! Once I get this done, I will write a little help file for people to use this based on my experiences and it would be good if that could be integrated into the documentation. It's a great package and it would be nice to see more statisticians use it.
Re: confidence intervals, I got them to work with this line:
function RegressionTables.ConfInt(m::Attempt.FixedVarianceLocalLevel, k::Int; level=0.95, standardize=false, vargs...)
RegressionTables.ConfInt((1,3))
end
where the tuple for now is a placeholder of course.
I definitely appreciate any help with the documentation!
Both of those issues you pointed out should be fixed on the latest version.
Ok, so I can use the confidence interval API
function StatsAPI.confint(m::MyModel; level::Real = 0.95)
hcat([1],[3])
end
where the function must return a $k \times 2$ matrix? for $k$ the number of parameters?
StatsAPI.bic
for the Bayesian information criterion?
On v0.7.5, it should (please let me know if it doesn't).
Just as a clarification though, the StatsAPI.confint
is expecting a matrix with nrow = length(coef)
, so
function StatsAPI.confint(m::MyModel; level::Real=0.95)
hcat(fill(1, length(coef(m)), fill(3, length(coef(m)))
end
Will not cause errors.
I can confirm that using the StatsAPI.confint
works now.
For adapting my own package to play nice with
RegressionTables
, I notice that a formula is mandatory in the new version. However, a formula per se makes little sense because the data would not have columns of hidden factorsA workaround was to construct a formula and pass it along:
That formula can be passed (in place of formula_schema). However, I already have a vector of parameter names which are not really column names but a different type of parameter not related to any data set but to the model itself. I suppose, I could convert a vector of strings to a formula. Is there a workaround somehow that is simple to use that avoids the formula object altogether?