Closed simon-hirsch closed 3 days ago
Implemented a first version that follows the API. Changes are mostly cosmetic, i.e. internally most stuff is working as before. Notable changes:
__init__()
sklearn
in the fit_intercept
parameter in the initX
directlyforget_deviance
? For now I've used self.forget[0]
which is the forget of the mean.equation
parameter is designed in a way to support pandas
and polars
already, but the scaler will fail. make_lags
method already, but not implemented in the API jet.Need to do now:
Code snipped to play around with after building from source on the branch:
import rolch
import numpy as np
from sklearn.datasets import load_diabetes
X, y = load_diabetes(return_X_y=True)
online_gamlss_new = rolch.OnlineGamlss(
distribution=rolch.DistributionT(),
equation={0: "all", 1 : np.arange(4, 8)},
fit_intercept=True,
method="lasso",
scale_inputs=True,
)
online_gamlss_new.fit(X, y)
online_gamlss_new.update(X[[-1], :], y[[-1]])
print(online_gamlss_new.betas)
Overall, I think this is a great improvement to the interface!
Thanks!
My main comment - or question - is about how you are handling
pandas
andpolars
. Is all that happens just a conversion tonumpy
? In this case, you may want to make the conversions extraneous to the estimator.skpro
also has default conversions implemented, if you want to use the boilerplate or just thedatatypes
module, though you probably do not want to take the dependency.The "nice" part would be if all internal calculations are native, though this may be out of scope.
Yes, I will be all handled by subsetting and conversion to numpy. We have quite a bit of numba
supported code in the deeper workings of the package and I don't see a huge benefit in re-writing e.g. coordinate decent to pandas or polars. Another reason is that this keeps the dependency on these libraries rather light, while the main numpy
API is rather stable, which makes maintenance easier :)
Starting a PR to refactor the public API for the Estimator object.
Goals:
equation
dictionary that specifies the regression equation for each distribution parameter by passing a list of column identifiers (column indices, names) or string parametersintercept
andall
.X
andy
inEstimator.fit(X, y)
pandas
andpolars
int
,bool
,float
or as dictionary of{parameter: value}
Discussions in #23 and #24