Closed dromare closed 1 year ago
Thanks for the question!
Scaling is done with other Greykite code before the X matrix is passed to fit()
in Greykite. (Specifically, the function normalize_df
(ref) in greykite.common.features.normalize
is called here).
When using the library, it can be specified with normalize_method
under custom
parameters. (Please refer to the document here on how to use this parameter.) The current options include "zero_to_one", "statistical", "minus_half_to_half" and "zero_at_origin". If None
is passed in, no normalization will be performed. Please refer to the descriptions of normalize_df
here on how each method works.
Thank you !
It looks like the normalized_method
under custom
parameters was introduced in Greykite v0.4, as seen in greykite/docs/0.3.0:
And not available in Greykite v0.3, as seen in greykite/docs/0.1.0:
We will need to update our Greykite version to 0.4 then, thanks.
BUT let me add a couple more comments:
zero_to_one
method is the default used by normalize_df
, but it is called min_max
in docs/0.3.0; I suppose the documentation needs to be updatedstatistical
method (StandardScaler) should be applied to the continuous features, while no method, or at most the zero_to_one
method, should be applied to categorical variablesHi! Thanks so much for the comments!
For 1:
Sorry I appended the wrong version above, the most current document is 0.4.0. This doesn’t include an introduction on the method of "zero_at_origin", but shall have the name changes fixed. We are actively working on keeping all documents updated.
I just fixed the link in the comment above. Thanks for pointing it out and sorry for the confusion!
For 2:
In Greykite, all categorical variables will use one-hot coding, and the columns shall only contain 0 and 1 (before fitting). We agree that it’s best practice to keep all categorical variables in this format, and that is why we set default to zero_to_one
.
However, if you prefer to use statistical
, you can go ahead with it. Functionally, it should not affect the prediction results much as these variables still contain only two values. It makes it had to read the coefficients though. Hope this helps!
Hi ! Thank you for the detailed explanation !
For 1: all clear.
For 2:
Let's say some external regressors are in binary form (0 and 1) then zero_to_one
is the best practice. But some others may be numerical; time-series features are also numerical. For numerical features, best practice would be statistical
. When fitting X, this may contains both types of features, so which normalize_method
would be best practice ?
I can also see that the option to scale external regressors is still available:
This means that external regressors may be subjected to double scaling, once when applying a non-None input__regressors_numeric__normalize__normalize_algorithm
method, and a second time when applying a non-None normalize_method
?
Hi! Thanks for the follow-up!
In the case you mention, we would suggest fitting with both methods and see how it goes with your dataset. For categorical variables, zero_to_one
would be the best practice but statistical
would still work. For numerical methods, zero_to_one
and statistical
usually have similar effects but statistical
might be more robust to outliers. If outliers are removed, zero_to_one
should perform similarly for numerical variables.
Generally we expect two methods to yield similar performances. Let us know if there are anything else to consider and please feel encouraged to let us know if either works out for you. Thanks!
Regarding the question on the option to scale external regressors: You are correct that external regressors may be scaled twice if these variables are set and normalize_method
has non-None value. Generally we encourage to leave input__regressors_numeric__normalize__normalize_algorithm
as None
and use the normalize_method
.
Thanks and hope this answer helps!
Thanks, all clear now !
statistical
would be recommended when the features can be assumed to have a Gaussian distribution, otherwise zero_to_one
would be a safer choice. Outliers can always be treated at the preprocessing stage anyway, using the input__response__outlier__
(for time-series features) and input__regressors_numeric__outlier__
(for numeric external regressors) options.
Feel free to close this issue whenever you like, thank you again !
I would like to know how Greykite handles the X matrix of features with regularized linear models (Ridge, Lasso, Elastic Net). In the literature we can read that scaling of the features is necessary for regularization to work.
Then, say time-series features like trend term, sine and cosine terms, autoregressive features etc. make up the X matrix, possibly with some other external regressor columns or not. At this point the time-series features are not scaled, i.e., we may have an estimated trend feature of magnitude 1, some sin() and/or cos() terms of magnitude 10, some others of magnitude 0.1, etc.
Is the X matrix at this point passed to
sklearn fit()
function like this:clf = RidgeCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
or is it scaled in Greykite code using
StandardScaler()
orMinMaxScaler()
or something else, before passing it tofit()
?( sklearn documentation:
sklearn.linear_model.RidgeCV(alphas=0.1, 1.0, 10.0, *, fit_intercept=True, **normalize=False**, scoring=None, cv=None, gcv_mode=None, store_cv_values=False, alpha_per_target=False)
normalize
: bool, default=False This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please useStandardScaler
before callingfit
on an estimator with normalize=False.)