wilsonrljr / sysidentpy

A Python Package For System Identification Using NARMAX Models

https://sysidentpy.org

BSD 3-Clause "New" or "Revised" License

390 stars 78 forks source link

Add uniform regularized regression #116

Closed dj-gauthier closed 11 months ago

dj-gauthier commented 1 year ago

Modify forward_regression_orthogonal_least_squares.py and estimators.py to enable ridge regression. You need to set ridge_param when calling the FROLS class. ridge_param is a model metaparameter that needs to be optimized using, for example, grid search, Bayesian optimization, etc. Typical usage for a multiple input, single output model:

model = FROLS( order_selection=True, n_terms=None, # how many terms you want in the model, None results in a model that meets criteria extended_least_squares=False, ylag=2, xlag=[[1,2],[1,2]], info_criteria="aic", ridge_param = 1.e-6, estimator="ridge_regression", #"least_squares", basis_function=basis_function, )

wilsonrljr commented 1 year ago

Hey @dj-gauthier , thanks for your contribution. I'll take a look at your pull request as soon as possible.

I release a new version last weekend, so, if you can, please rebase your branch with master

dj-gauthier commented 1 year ago

Thank you Wilson! I will try to do as you say and issue a new pull request. Dan

On Tue, Sep 26, 2023 at 11:03 AM Wilson Rocha @.***> wrote:

Hey @dj-gauthier https://github.com/dj-gauthier , thanks for your contribution. I'll take a look at your pull request as soon as possible.

I release a new version last weekend, so, if you can, please rebase your branch with master

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1735729792, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24VHGQJK5SC5J6OAZZDX4LVFLANCNFSM6AAAAAA46X7G2Q . You are receiving this because you were mentioned.Message ID: @.***>

wilsonrljr commented 1 year ago

Thanks! You can just update your master branch in your fork, rebase your branch and commit in this pull request. No need to open a new one

dj-gauthier commented 1 year ago

Dear Wilson,

Sorry for not replying right away. I think I have found a bug in the "predict" routine when doing one-step-ahead prediction for a miso problem and I have been trying to track that down first (polynomial features). It seems to be predicting backwards in time rather than forward. I am traveling over the weekend and won't be able to work on it until next week. I will write more once I have new information.

Dan

On Tue, Sep 26, 2023 at 11:20 AM Wilson Rocha @.***> wrote:

Thanks! You can just update your master branch in your fork, rebase your branch and commit in this pull request. No need to open a new one

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1735765630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24QPDHTDPB2PDKI7I53X4LXDNANCNFSM6AAAAAA46X7G2Q . You are receiving this because you were mentioned.Message ID: @.***>

wilsonrljr commented 1 year ago

No problem! Take your time and enjoy your trip.

Are you sure the issue you found is related to one_step_ahead prediction? Its basically a matrix multiplication (X*b), where X is the regressor matrix and b are the coefficients. When you are back, lets check that! Thanks

dj-gauthier commented 1 year ago

Dear Wilson,

I think there is an error in the regressor matrix, either during training or in testing. But it is the same routine building the regressor matrix, so it may be in both.

The problem is that the documentation does not go into detail on how X and y need to be supplied to the routines. During training, you need to supply it with y(k) as well as lagged values such as y(k-1), y(k-2), x1(k-1), x1(k-2), etc. the documentation does not give information about the positioning of these elements in the arrays sent to “fit”.

For testing (“predict”), let’s say I just want to perform a single one-step ahead prediction. I need to give it the lagged states, but not y(k). When I try this, it gives a prediction for y(k) that is identical to one of the lagged values of y.

I will have time to do a dive into the code on Wednesday. I am also happy to get on a video call with you if you like and have time. I am in Columbus, Ohio on the East Coast of the US - we probably only have a time zone shift of one or two hours. The evenings and most weekends can work for me if that is when you can work on this project.

Dan

On Mon, Oct 2, 2023 at 9:11 PM Wilson Rocha @.***> wrote:

No problem! Take your time and enjoy your trip.

Are you sure the issue you found is related to one_step_ahead prediction? Its basically a matrix multiplication (X*b), where X is the regressor matrix and b are the coefficients. When you are back, lets check that! Thanks

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1744006136, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24WOSSWTIJ3PXXXP63TX5NQ47AVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUGAYDMMJTGY . You are receiving this because you were mentioned.Message ID: @.***>

wilsonrljr commented 1 year ago

Yeah, lets talk about it! I'll give you more information about how the predict works.

In the predict method, you need to pass the initial conditions based on the max lag of the fitted model. However, we return the initial conditions as the first values of the prediction array. So, if you pass two values as initial conditions, the first predicted value is actually the third value of the array (the first two will be the same of the values passed as initial conditions).

I don't know if that can be the case, but lets talk about that so we can check it together. Thursday works for me

dj-gauthier commented 1 year ago

Dear Wilson,

I am available (all US East Coast times): 9 am - 3 pm and after 7 pm. I can send a Zoom link (or Teams - what do you prefer?) once we select a time.

I do see in the "predict" code where you concatenate the initial condition with the prediction. However, what is passed to the output is only a scalar. So it appears as if something is being stripped, including the prediction. I will look more into this on Wednesday.

Best, Dan

On Tue, Oct 3, 2023 at 10:53 AM Wilson Rocha @.***> wrote:

Yeah, lets talk about it! I'll give you more information about how the predict works.

In the predict method, you need to pass the initial conditions based on the max lag of the fitted model. However, we return the initial conditions as the first values of the prediction array. So, if you pass two values as initial conditions, the first predicted value is actually the third value of the array (the first two will be the same of the values passed as initial conditions).

I don't know if that can be the case, but lets talk about that so we can check it together. Thursday works for me

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1745145942, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24VVCTGFI4ZP6KYTJFDX5QRE7AVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBVGE2DKOJUGI . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

Dear Wilson,

I've looked at my code, which calls SyIdentPy, to make sure I am not doing any type casting that would mess with what is passed back from "fit." The code is not complicated - I am trying to do one-step ahead forecasting of the Lorenz63 chaotic attractor (just the "x" variable for now - will generalize later to multiple input multiple output). A scalar is definitely returned by fit. There is a lot of the code that does not matter, such as defining the number of points, etc. So I will just put the core code that matters below. I suspect you are familiar with the Lorenz63 system - it is in the graphic on the SysIdentPy page ...

This can be part of what we discuss on Thursday unless you can easily see what I am doing wrong.

More tomorrow when I have time to start to debug what is going in within SysIdentPy

Dan

integrate the Lorenz equations

lorenz_soln = solve_ivp(lorenz, (0, maxtime), [17.67715816276679, 12.931379185960404, 43.91404334248268], t_eval=t_eval, method='DOP853', rtol = 1.e-8, atol = 1.e-7)

perform system identification for Lorenz x variable - I am using "least

squares" for now and have commented out anything related to my new code to

regularize the fit

basis_function = Polynomial(degree=2)

model_x = FROLS( order_selection=True, n_terms=8, # how many terms you want in the model, None results in a model that meets criteria extended_least_squares=False, ylag=2, xlag=[[1,2],[1,2]], info_criteria="aic",

ridge_param = 1.e-6,

estimator="least_squares",
basis_function=basis_function,

)

fit Lorenz x-variable

model_x.fit(X = lorenz_soln.y[[1,2],warmup_pts-1:warmtrain_pts-1].T, y = lorenz_soln.y[0,warmup_pts:warmtrain_pts].reshape(-1,1) )

r_x = pd.DataFrame( results( model_x.final_model, model_x.theta, model_x.err, model_x.n_terms, err_precision=8, dtype="sci", ), columns=["Regressors", "Parameters", "ERR"], ) print('lorenz x-variable \n',r_x)

x_train[0,:] = model_x.predict(X = lorenz_soln.y[[1,2],warmup_pts-1:warmtrain_pts-1].T, y = lorenz_soln.y[0,warmup_pts:warmtrain_pts].reshape(-1,1),steps_ahead = 1)[:,0]

here is where I try to do a single one-step-ahead prediction where I

give it the initial conditions for your "y" (the x variable of Lorenz) and the two-input "X" variable (containing

the y and z Lorenz variables. x_dum has a single element.

x_dum = model_x.predict(X = lorenz_soln.y[[1,2],warmtrain_pts-2:warmtrain_pts].T, y = lorenz_soln.y[0,warmtrain_pts-2:warmtrain_pts].reshape(-1,1),steps_ahead = 1)[0]

Here is the output:

    Regressors   Parameters             ERR

0 y(k-1) 1.0741E+01 9.79071766E-01 1 x2(k-2)y(k-2) 3.4989E-04 2.01315845E-02 2 y(k-2) -8.3047E+00 2.18473234E-04 3 x1(k-1) -1.9823E+00 5.66176531E-04 4 x2(k-1)y(k-2) 3.2036E-02 8.46278885E-06 5 x2(k-1)y(k-1) -1.6753E-02 3.11730917E-06 6 x2(k-2)x1(k-2) -1.5872E-03 1.46269353E-07 7 x2(k-1)x1(k-1) 6.1849E-03 1.98607814E-07 C:\Users\gauthier.51\AppData\Local\anaconda3\Lib\site-packages\sysidentpy\utils\deprecation.py:37: FutureWarning: Passing a string to define the estimator will rise an error in v0.4.0. You'll have to use FROLS(estimator=LeastSquares()) instead. The only change is that you'll have to define the estimator first instead of passing a string like 'least_squares'. This change will make easier to implement new estimators and it'll improve code readability. warnings.warn(message, FutureWarning)

In[3]: x_dum.shape Out[3]: (1,)

x_dum has one of the lagged values (one of the initial conditions), which suggests that the returned scalar only has a slice of the returned predicted state.

On Tue, Oct 3, 2023 at 11:37 AM Daniel Gauthier @.***> wrote:

Dear Wilson,

I am available (all US East Coast times): 9 am - 3 pm and after 7 pm. I can send a Zoom link (or Teams - what do you prefer?) once we select a time.

I do see in the "predict" code where you concatenate the initial condition with the prediction. However, what is passed to the output is only a scalar. So it appears as if something is being stripped, including the prediction. I will look more into this on Wednesday.

Best, Dan

On Tue, Oct 3, 2023 at 10:53 AM Wilson Rocha @.***> wrote:

Yeah, lets talk about it! I'll give you more information about how the predict works.

In the predict method, you need to pass the initial conditions based on the max lag of the fitted model. However, we return the initial conditions as the first values of the prediction array. So, if you pass two values as initial conditions, the first predicted value is actually the third value of the array (the first two will be the same of the values passed as initial conditions).

I don't know if that can be the case, but lets talk about that so we can check it together. Thursday works for me

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1745145942, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24VVCTGFI4ZP6KYTJFDX5QRE7AVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBVGE2DKOJUGI . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

Dear Wilson,

I found one bug in my code - I have an [0] at the end of the call to predict. I removed it, but there is still a problem.

Here is what I am giving to the routine regarding the initial conditions for y:

y = lorenz_soln.y[0,warmtrain_pts-2:warmtrain_pts]

this is equal to array([0.58704797, 1.34433146])

x_dum returns

[5] In: x_dum Out[5]: array([[0.58704797], [1.34433146]])

Which is just the initial conditions given to it. Based on what you say, it should give back three elements - the prediction and the initial conditions. But it seems that the prediction is stripped out and not returned by predict.

Thanks, Dan

On Tue, Oct 3, 2023 at 2:42 PM Daniel Gauthier @.***> wrote:

Dear Wilson,

I've looked at my code, which calls SyIdentPy, to make sure I am not doing any type casting that would mess with what is passed back from "fit." The code is not complicated - I am trying to do one-step ahead forecasting of the Lorenz63 chaotic attractor (just the "x" variable for now - will generalize later to multiple input multiple output). A scalar is definitely returned by fit. There is a lot of the code that does not matter, such as defining the number of points, etc. So I will just put the core code that matters below. I suspect you are familiar with the Lorenz63 system - it is in the graphic on the SysIdentPy page ...

This can be part of what we discuss on Thursday unless you can easily see what I am doing wrong.

More tomorrow when I have time to start to debug what is going in within SysIdentPy

Dan

integrate the Lorenz equations

lorenz_soln = solve_ivp(lorenz, (0, maxtime), [17.67715816276679, 12.931379185960404, 43.91404334248268], t_eval=t_eval, method='DOP853', rtol = 1.e-8, atol = 1.e-7)

perform system identification for Lorenz x variable - I am using

"least squares" for now and have commented out anything related to my new code to

regularize the fit

basis_function = Polynomial(degree=2)

model_x = FROLS( order_selection=True, n_terms=8, # how many terms you want in the model, None results in a model that meets criteria extended_least_squares=False, ylag=2, xlag=[[1,2],[1,2]], info_criteria="aic",

ridge_param = 1.e-6,
estimator="least_squares",
basis_function=basis_function,
)

fit Lorenz x-variable

model_x.fit(X = lorenz_soln.y[[1,2],warmup_pts-1:warmtrain_pts-1].T, y = lorenz_soln.y[0,warmup_pts:warmtrain_pts].reshape(-1,1) )

r_x = pd.DataFrame( results( model_x.final_model, model_x.theta, model_x.err, model_x.n_terms, err_precision=8, dtype="sci", ), columns=["Regressors", "Parameters", "ERR"], ) print('lorenz x-variable \n',r_x)

x_train[0,:] = model_x.predict(X = lorenz_soln.y[[1,2],warmup_pts-1:warmtrain_pts-1].T, y = lorenz_soln.y[0,warmup_pts:warmtrain_pts].reshape(-1,1),steps_ahead = 1)[:,0]

here is where I try to do a single one-step-ahead prediction where I

give it the initial conditions for your "y" (the x variable of Lorenz) and the two-input "X" variable (containing

the y and z Lorenz variables. x_dum has a single element.

x_dum = model_x.predict(X = lorenz_soln.y[[1,2],warmtrain_pts-2:warmtrain_pts].T, y = lorenz_soln.y[0,warmtrain_pts-2:warmtrain_pts].reshape(-1,1),steps_ahead = 1)[0]

Here is the output:
    Regressors   Parameters             ERR
0 y(k-1) 1.0741E+01 9.79071766E-01 1 x2(k-2)y(k-2) 3.4989E-04 2.01315845E-02 2 y(k-2) -8.3047E+00 2.18473234E-04 3 x1(k-1) -1.9823E+00 5.66176531E-04 4 x2(k-1)y(k-2) 3.2036E-02 8.46278885E-06 5 x2(k-1)y(k-1) -1.6753E-02 3.11730917E-06 6 x2(k-2)x1(k-2) -1.5872E-03 1.46269353E-07 7 x2(k-1)x1(k-1) 6.1849E-03 1.98607814E-07 C:\Users\gauthier.51\AppData\Local\anaconda3\Lib\site-packages\sysidentpy\utils\deprecation.py:37: FutureWarning: Passing a string to define the estimator will rise an error in v0.4.0. You'll have to use FROLS(estimator=LeastSquares()) instead. The only change is that you'll have to define the estimator first instead of passing a string like 'least_squares'. This change will make easier to implement new estimators and it'll improve code readability. warnings.warn(message, FutureWarning)

In[3]: x_dum.shape Out[3]: (1,)

x_dum has one of the lagged values (one of the initial conditions), which suggests that the returned scalar only has a slice of the returned predicted state.

On Tue, Oct 3, 2023 at 11:37 AM Daniel Gauthier < @.***> wrote:

Dear Wilson,

I am available (all US East Coast times): 9 am - 3 pm and after 7 pm. I can send a Zoom link (or Teams - what do you prefer?) once we select a time.

I do see in the "predict" code where you concatenate the initial condition with the prediction. However, what is passed to the output is only a scalar. So it appears as if something is being stripped, including the prediction. I will look more into this on Wednesday.

Best, Dan

On Tue, Oct 3, 2023 at 10:53 AM Wilson Rocha @.***> wrote:

Yeah, lets talk about it! I'll give you more information about how the predict works.

In the predict method, you need to pass the initial conditions based on the max lag of the fitted model. However, we return the initial conditions as the first values of the prediction array. So, if you pass two values as initial conditions, the first predicted value is actually the third value of the array (the first two will be the same of the values passed as initial conditions).

I don't know if that can be the case, but lets talk about that so we can check it together. Thursday works for me

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1745145942, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24VVCTGFI4ZP6KYTJFDX5QRE7AVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBVGE2DKOJUGI . You are receiving this because you were mentioned.Message ID: @.***>

wilsonrljr commented 1 year ago

Great! I'll take a look at this. Can you talk tomorrow (either zoom and teams works for me, even discord) at 8pm (your timezone)?

wilsonrljr commented 1 year ago

I'm trying to reproduce your case using a basic one step ahead prediction scenario, but I'm not getting any issues. I'll keep going with other cases, but here is what I did for now:

import numpy as np
import pandas as pd
from sysidentpy.model_structure_selection import FROLS
from sysidentpy.basis_function._basis_function import Polynomial
from sysidentpy.metrics import root_relative_squared_error
from sysidentpy.utils.display_results import results
from sysidentpy.utils.generate_data import get_miso_data

# generating some random data based on simulated models
x_train, x_valid, y_train, y_valid = get_miso_data(
    n=1000, colored_noise=False, sigma=0.001, train_percentage=90
)

basis_function = Polynomial(degree=2)

model = FROLS(
    order_selection=True,
    n_terms=4,
    extended_least_squares=False,
    ylag=2,
    xlag=[[1, 2], [1, 2]],
    info_criteria="aic",
    estimator="least_squares",
    basis_function=basis_function,
)

model.fit(X=x_train, y=y_train)
# which give me the following model

#        Regressors   Parameters             ERR
# 0         x2(k-1)   6.0004E-01  9.12999079E-01
# 1  x2(k-2)x1(k-1)  -2.9956E-01  4.53231002E-02
# 2        y(k-1)^2   3.9996E-01  4.13865100E-02
# 3   x1(k-1)y(k-1)   9.9260E-02  2.84280632E-04

Now, lets do the one step ahead prediction. I'll predict only one value to demonstrate. In this case, the maximum lag of the fitted model is 2 (based on regressor x2(k-2)x1(k-1), so we have to put at least two initial conditions. So, my y_valid should be y_valid[:3, :], but my x_valid should be x_valid[:3, :] because the first two values are only to initialize the model and the third will be related to the state you want to predict. So, in that case, I did as follows

model.predict(X=x_valid[:3, :], y=y_valid[:3, :], steps_ahead=1)

which returned

array([[0.5059659 ],
       [0.16046676],
       [0.01766819]])

The first two values are only the initial conditions, as expected, but the third value is the one step ahead prediction we want.

Is the length of your X used in the predicted the same as the maximum lag of the model? If thats the case, you will have only the initial conditions returned without any prediction.

dj-gauthier commented 1 year ago

Dear Wilson,

I will have to think about what you say carefully today, but I should only have to give it 2 for x, not 3 to properly implement the NARMAX forecast. It seems that you are using x_valid[k] to infer y_valid[k], which is an inference task, not a forecasting task. For the multi input, multi output forecasting task, I don’t know x_valid[k].

Let me think more about this today and I can write up some notes. I will also send a zoom link.

Thanks, Dan

On Wed, Oct 4, 2023 at 8:38 AM Wilson Rocha @.***> wrote:

I'm trying to reproduce your case using a basic one step ahead prediction scenario, but I'm not getting any issues. I'll keep going with other cases, but here is what I did for now:

import numpy as np import pandas as pd from sysidentpy.model_structure_selection import FROLS from sysidentpy.basis_function._basis_function import Polynomial from sysidentpy.metrics import root_relative_squared_error from sysidentpy.utils.display_results import results from sysidentpy.utils.generate_data import get_miso_data

generating some random data based on simulated models

x_train, x_valid, y_train, y_valid = get_miso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 )

basis_function = Polynomial(degree=2)

model = FROLS( order_selection=True, n_terms=4, extended_least_squares=False, ylag=2, xlag=[[1, 2], [1, 2]], info_criteria="aic", estimator="least_squares", basis_function=basis_function, )

model.fit(X=x_train, y=y_train)

which give me the following model

Regressors Parameters ERR

0 x2(k-1) 6.0004E-01 9.12999079E-01

1 x2(k-2)x1(k-1) -2.9956E-01 4.53231002E-02

2 y(k-1)^2 3.9996E-01 4.13865100E-02

3 x1(k-1)y(k-1) 9.9260E-02 2.84280632E-04

Now, lets to the one step ahead prediction. I'll predict only one value to demonstrate. In this case, the maximum lag of the fitted model is 2 (based on regressor x2(k-2)x1(k-1), so we have to put at least two initial conditions. So, my y_valid should be y_valid[:2, :], but my x_valid should be x_valid[:3, :] because the first two values are only to initialize the model and the third will be related to the state you want to predict. So, in that case, I did as follows

model.predict(X=x_valid[:3, :], y=y_valid[:2, :])

which returned

array([[0.5059659 ], [0.16046676], [0.01766819]])

The first two values are only the initial conditions, as expected, but the third value is the one step ahead prediction we want.

Is the length of your X used in the predicted the same as the maximum lag of the model? If thats the case, you will have only the initial conditions returned without any prediction.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1746793661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24UKM3NFQ3HGD53IGWDX5VKFLAVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWG44TGNRWGE . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

Dan Gauthier is inviting you to a scheduled Zoom meeting.

Topic: Dan Gauthier/Wilson discuss SysIdentPy Time: Oct 5, 2023 08:00 PM Eastern Time (US and Canada)

Join Zoom Meeting https://osu.zoom.us/j/95022163934?pwd=bjNZQkFsNHkxT0hETnlCblVDeGZkUT09

Meeting ID: 950 2216 3934 Password: 635940 One tap mobile +16468769923,,95022163934#,,,,0#,,635940# US (New York) +16513728299,,95022163934#,,,,0#,,635940# US (Minnesota)

Dial by your location +1 646 876 9923 US (New York) +1 651 372 8299 US (Minnesota) +1 301 715 8592 US (Washington DC) +1 312 626 6799 US (Chicago) +1 408 638 0968 US (San Jose) +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) +1 346 248 7799 US (Houston) Meeting ID: 950 2216 3934 Password: 635940 Find your local number: https://osu.zoom.us/u/abvVZ5skZ5

Join by SIP @.***

Join by H.323 162.255.37.11 (US West) 162.255.36.11 (US East) 115.114.131.7 (India Mumbai) 115.114.115.7 (India Hyderabad) 213.19.144.110 (Amsterdam Netherlands) 213.244.140.110 (Germany) 103.122.166.55 (Australia Sydney) 103.122.167.55 (Australia Melbourne) 64.211.144.160 (Brazil) 69.174.57.160 (Canada Toronto) 65.39.152.160 (Canada Vancouver) 207.226.132.110 (Japan Tokyo) 149.137.24.110 (Japan Osaka) Meeting ID: 950 2216 3934 Password: 635940

The Ohio State University

Please direct question about this meeting to the meeting organizer.

CarmenZoom is a service provided by the Office of Technology and Digital Innovation (IT.osu.edu).

go.osu.edu/SystemStatus 614-688-4357 (HELP) @.***

If you have a disability and have trouble accessing this content, please call the Accessibility Help Line 614-292-5000.

Privacy: go.osu.edu/privacy Digital Accessibility: accessibility.osu.edu Nondiscrimination Notice (PDF): go.osu.edu/NonDiscrimination-Notice

On Wed, Oct 4, 2023 at 10:18 AM Daniel Gauthier @.***> wrote:

Dear Wilson,

I will have to think about what you say carefully today, but I should only have to give it 2 for x, not 3 to properly implement the NARMAX forecast. It seems that you are using x_valid[k] to infer y_valid[k], which is an inference task, not a forecasting task. For the multi input, multi output forecasting task, I don’t know x_valid[k].

Let me think more about this today and I can write up some notes. I will also send a zoom link.

Thanks, Dan

On Wed, Oct 4, 2023 at 8:38 AM Wilson Rocha @.***> wrote:

I'm trying to reproduce your case using a basic one step ahead prediction scenario, but I'm not getting any issues. I'll keep going with other cases, but here is what I did for now:

import numpy as np import pandas as pd from sysidentpy.model_structure_selection import FROLS from sysidentpy.basis_function._basis_function import Polynomial from sysidentpy.metrics import root_relative_squared_error from sysidentpy.utils.display_results import results from sysidentpy.utils.generate_data import get_miso_data

generating some random data based on simulated models

x_train, x_valid, y_train, y_valid = get_miso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 )

basis_function = Polynomial(degree=2)

model = FROLS( order_selection=True, n_terms=4, extended_least_squares=False, ylag=2, xlag=[[1, 2], [1, 2]], info_criteria="aic", estimator="least_squares", basis_function=basis_function, )

model.fit(X=x_train, y=y_train)

which give me the following model

Regressors Parameters ERR

0 x2(k-1) 6.0004E-01 9.12999079E-01

1 x2(k-2)x1(k-1) -2.9956E-01 4.53231002E-02

2 y(k-1)^2 3.9996E-01 4.13865100E-02

3 x1(k-1)y(k-1) 9.9260E-02 2.84280632E-04

Now, lets to the one step ahead prediction. I'll predict only one value to demonstrate. In this case, the maximum lag of the fitted model is 2 (based on regressor x2(k-2)x1(k-1), so we have to put at least two initial conditions. So, my y_valid should be y_valid[:2, :], but my x_valid should be x_valid[:3, :] because the first two values are only to initialize the model and the third will be related to the state you want to predict. So, in that case, I did as follows

model.predict(X=x_valid[:3, :], y=y_valid[:2, :])

which returned

array([[0.5059659 ], [0.16046676], [0.01766819]])

The first two values are only the initial conditions, as expected, but the third value is the one step ahead prediction we want.

Is the length of your X used in the predicted the same as the maximum lag of the model? If thats the case, you will have only the initial conditions returned without any prediction.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1746793661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24UKM3NFQ3HGD53IGWDX5VKFLAVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWG44TGNRWGE . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

Dear Wilson,

I think I am starting to see the problem. I refer to your paper published in the Open Source Software journal (attached for easy reference).

Equation (1) is consistent with the notation in your code. However, this equation is not consistent with papers from Billing’s group – they do not have a delay parameter d. See the attached paper by Billings, their Eq.

In the Billing’s paper, his u is your x. As you can see, his terms in the argument of the function go from u(k-1) to u(k-n_u). For your default value of d, you consider terms from x(k-1) to x(k-d-n_x). Thus, n_u = d + n_x. Using your notation suggests that I should take xlag = 1 (because the d=1 will actually give me two lags) and hence this should include features like x(k-1) and x(k-2). But it does not. It only includes terms up to x(k-1). So this is one error in your code.

Further adding to the confusion is Eq. (2) of your paper. [I believe there is a typo in Eq. (2), the symbol u should be x (the hint is that the upper limit of the product is n_x, not n_u).] Here, you take j=0 to n_x, but this is inconsistent with your Eq. (1) and is inconsistent with Billing’s past published work. In a forecasting situation, you do not have access to x_n, you only have access to past states. Thus, j should never be equal to

This is another error in your code.

Best, Dan

On Wed, Oct 4, 2023 at 10:31 AM Daniel Gauthier @.***> wrote:

Dan Gauthier is inviting you to a scheduled Zoom meeting.

Topic: Dan Gauthier/Wilson discuss SysIdentPy Time: Oct 5, 2023 08:00 PM Eastern Time (US and Canada)

Join Zoom Meeting https://osu.zoom.us/j/95022163934?pwd=bjNZQkFsNHkxT0hETnlCblVDeGZkUT09

Meeting ID: 950 2216 3934 Password: 635940 One tap mobile +16468769923,,95022163934#,,,,0#,,635940# US (New York) +16513728299,,95022163934#,,,,0#,,635940# US (Minnesota)

Dial by your location +1 646 876 9923 US (New York) +1 651 372 8299 US (Minnesota) +1 301 715 8592 US (Washington DC) +1 312 626 6799 US (Chicago) +1 408 638 0968 US (San Jose) +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) +1 346 248 7799 US (Houston) Meeting ID: 950 2216 3934 Password: 635940 Find your local number: https://osu.zoom.us/u/abvVZ5skZ5

Join by SIP @.***

Join by H.323 162.255.37.11 (US West) 162.255.36.11 (US East) 115.114.131.7 (India Mumbai) 115.114.115.7 (India Hyderabad) 213.19.144.110 (Amsterdam Netherlands) 213.244.140.110 (Germany) 103.122.166.55 (Australia Sydney) 103.122.167.55 (Australia Melbourne) 64.211.144.160 (Brazil) 69.174.57.160 (Canada Toronto) 65.39.152.160 (Canada Vancouver) 207.226.132.110 (Japan Tokyo) 149.137.24.110 (Japan Osaka) Meeting ID: 950 2216 3934 Password: 635940

The Ohio State University

Please direct question about this meeting to the meeting organizer.

CarmenZoom is a service provided by the Office of Technology and Digital Innovation (IT.osu.edu).

go.osu.edu/SystemStatus 614-688-4357 (HELP) @.***

If you have a disability and have trouble accessing this content, please call the Accessibility Help Line 614-292-5000.

Privacy: go.osu.edu/privacy Digital Accessibility: accessibility.osu.edu Nondiscrimination Notice (PDF): go.osu.edu/NonDiscrimination-Notice

On Wed, Oct 4, 2023 at 10:18 AM Daniel Gauthier < @.***> wrote:

Dear Wilson,

I will have to think about what you say carefully today, but I should only have to give it 2 for x, not 3 to properly implement the NARMAX forecast. It seems that you are using x_valid[k] to infer y_valid[k], which is an inference task, not a forecasting task. For the multi input, multi output forecasting task, I don’t know x_valid[k].

Let me think more about this today and I can write up some notes. I will also send a zoom link.

Thanks, Dan

On Wed, Oct 4, 2023 at 8:38 AM Wilson Rocha @.***> wrote:

I'm trying to reproduce your case using a basic one step ahead prediction scenario, but I'm not getting any issues. I'll keep going with other cases, but here is what I did for now:

import numpy as np import pandas as pd from sysidentpy.model_structure_selection import FROLS from sysidentpy.basis_function._basis_function import Polynomial from sysidentpy.metrics import root_relative_squared_error from sysidentpy.utils.display_results import results from sysidentpy.utils.generate_data import get_miso_data

generating some random data based on simulated models

x_train, x_valid, y_train, y_valid = get_miso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 )

basis_function = Polynomial(degree=2)

model = FROLS( order_selection=True, n_terms=4, extended_least_squares=False, ylag=2, xlag=[[1, 2], [1, 2]], info_criteria="aic", estimator="least_squares", basis_function=basis_function, )

model.fit(X=x_train, y=y_train)

which give me the following model

Regressors Parameters ERR

0 x2(k-1) 6.0004E-01 9.12999079E-01

1 x2(k-2)x1(k-1) -2.9956E-01 4.53231002E-02

2 y(k-1)^2 3.9996E-01 4.13865100E-02

3 x1(k-1)y(k-1) 9.9260E-02 2.84280632E-04

Now, lets to the one step ahead prediction. I'll predict only one value to demonstrate. In this case, the maximum lag of the fitted model is 2 (based on regressor x2(k-2)x1(k-1), so we have to put at least two initial conditions. So, my y_valid should be y_valid[:2, :], but my x_valid should be x_valid[:3, :] because the first two values are only to initialize the model and the third will be related to the state you want to predict. So, in that case, I did as follows

model.predict(X=x_valid[:3, :], y=y_valid[:2, :])

which returned

array([[0.5059659 ], [0.16046676], [0.01766819]])

The first two values are only the initial conditions, as expected, but the third value is the one step ahead prediction we want.

Is the length of your X used in the predicted the same as the maximum lag of the model? If thats the case, you will have only the initial conditions returned without any prediction.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1746793661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24UKM3NFQ3HGD53IGWDX5VKFLAVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWG44TGNRWGE . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

Dear Wilson,

A related comment. In your paper, Eq. (3), xk does not appear on the right hand side of the equation, which is inconsistent with Eq. 2. Also, to make a prediction, you only have to provide it with y{k-1}, y{k-2}, x{k-1} and x_{k-2}. Thus, you only have to provide it with 2 values of x, not the 3 you used in you example you sent me yesterday.

Thanks, Dan

On Wed, Oct 4, 2023 at 12:25 PM Daniel Gauthier @.***> wrote:

Dear Wilson,

I think I am starting to see the problem. I refer to your paper published in the Open Source Software journal (attached for easy reference).

Equation (1) is consistent with the notation in your code. However, this equation is not consistent with papers from Billing’s group – they do not have a delay parameter d. See the attached paper by Billings, their Eq.

In the Billing’s paper, his u is your x. As you can see, his terms in the argument of the function go from u(k-1) to u(k-n_u). For your default value of d, you consider terms from x(k-1) to x(k-d-n_x). Thus, n_u = d + n_x. Using your notation suggests that I should take xlag = 1 (because the d=1 will actually give me two lags) and hence this should include features like x(k-1) and x(k-2). But it does not. It only includes terms up to x(k-1). So this is one error in your code.

Further adding to the confusion is Eq. (2) of your paper. [I believe there is a typo in Eq. (2), the symbol u should be x (the hint is that the upper limit of the product is n_x, not n_u).] Here, you take j=0 to n_x, but this is inconsistent with your Eq. (1) and is inconsistent with Billing’s past published work. In a forecasting situation, you do not have access to x_n, you only have access to past states. Thus, j should never be equal to 0. This is another error in your code.

Best, Dan

On Wed, Oct 4, 2023 at 10:31 AM Daniel Gauthier < @.***> wrote:

Dan Gauthier is inviting you to a scheduled Zoom meeting.

Topic: Dan Gauthier/Wilson discuss SysIdentPy Time: Oct 5, 2023 08:00 PM Eastern Time (US and Canada)

Join Zoom Meeting https://osu.zoom.us/j/95022163934?pwd=bjNZQkFsNHkxT0hETnlCblVDeGZkUT09

Meeting ID: 950 2216 3934 Password: 635940 One tap mobile +16468769923,,95022163934#,,,,0#,,635940# US (New York) +16513728299,,95022163934#,,,,0#,,635940# US (Minnesota)

Dial by your location +1 646 876 9923 US (New York) +1 651 372 8299 US (Minnesota) +1 301 715 8592 US (Washington DC) +1 312 626 6799 US (Chicago) +1 408 638 0968 US (San Jose) +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) +1 346 248 7799 US (Houston) Meeting ID: 950 2216 3934 Password: 635940 Find your local number: https://osu.zoom.us/u/abvVZ5skZ5

Join by SIP @.***

Join by H.323 162.255.37.11 (US West) 162.255.36.11 (US East) 115.114.131.7 (India Mumbai) 115.114.115.7 (India Hyderabad) 213.19.144.110 (Amsterdam Netherlands) 213.244.140.110 (Germany) 103.122.166.55 (Australia Sydney) 103.122.167.55 (Australia Melbourne) 64.211.144.160 (Brazil) 69.174.57.160 (Canada Toronto) 65.39.152.160 (Canada Vancouver) 207.226.132.110 (Japan Tokyo) 149.137.24.110 (Japan Osaka) Meeting ID: 950 2216 3934 Password: 635940

The Ohio State University

Please direct question about this meeting to the meeting organizer.

CarmenZoom is a service provided by the Office of Technology and Digital Innovation (IT.osu.edu).

go.osu.edu/SystemStatus 614-688-4357 (HELP) @.***

If you have a disability and have trouble accessing this content, please call the Accessibility Help Line 614-292-5000.

Privacy: go.osu.edu/privacy Digital Accessibility: accessibility.osu.edu Nondiscrimination Notice (PDF): go.osu.edu/NonDiscrimination-Notice

On Wed, Oct 4, 2023 at 10:18 AM Daniel Gauthier < @.***> wrote:

Dear Wilson,

I will have to think about what you say carefully today, but I should only have to give it 2 for x, not 3 to properly implement the NARMAX forecast. It seems that you are using x_valid[k] to infer y_valid[k], which is an inference task, not a forecasting task. For the multi input, multi output forecasting task, I don’t know x_valid[k].

Let me think more about this today and I can write up some notes. I will also send a zoom link.

Thanks, Dan

On Wed, Oct 4, 2023 at 8:38 AM Wilson Rocha @.***> wrote:

I'm trying to reproduce your case using a basic one step ahead prediction scenario, but I'm not getting any issues. I'll keep going with other cases, but here is what I did for now:

import numpy as np import pandas as pd from sysidentpy.model_structure_selection import FROLS from sysidentpy.basis_function._basis_function import Polynomial from sysidentpy.metrics import root_relative_squared_error from sysidentpy.utils.display_results import results from sysidentpy.utils.generate_data import get_miso_data

generating some random data based on simulated models

x_train, x_valid, y_train, y_valid = get_miso_data( n=1000, colored_noise=False, sigma=0.001, train_percentage=90 )

basis_function = Polynomial(degree=2)

model = FROLS( order_selection=True, n_terms=4, extended_least_squares=False, ylag=2, xlag=[[1, 2], [1, 2]], info_criteria="aic", estimator="least_squares", basis_function=basis_function, )

model.fit(X=x_train, y=y_train)

which give me the following model

Regressors Parameters ERR

0 x2(k-1) 6.0004E-01 9.12999079E-01

1 x2(k-2)x1(k-1) -2.9956E-01 4.53231002E-02

2 y(k-1)^2 3.9996E-01 4.13865100E-02

3 x1(k-1)y(k-1) 9.9260E-02 2.84280632E-04

Now, lets to the one step ahead prediction. I'll predict only one value to demonstrate. In this case, the maximum lag of the fitted model is 2 (based on regressor x2(k-2)x1(k-1), so we have to put at least two initial conditions. So, my y_valid should be y_valid[:2, :], but my x_valid should be x_valid[:3, :] because the first two values are only to initialize the model and the third will be related to the state you want to predict. So, in that case, I did as follows

model.predict(X=x_valid[:3, :], y=y_valid[:2, :])

which returned

array([[0.5059659 ], [0.16046676], [0.01766819]])

The first two values are only the initial conditions, as expected, but the third value is the one step ahead prediction we want.

Is the length of your X used in the predicted the same as the maximum lag of the model? If thats the case, you will have only the initial conditions returned without any prediction.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1746793661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24UKM3NFQ3HGD53IGWDX5VKFLAVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWG44TGNRWGE . You are receiving this because you were mentioned.Message ID: @.***>

wilsonrljr commented 1 year ago

@dj-gauthier , sorry for the last answer, it was actually wrong because I did a infinity steps ahead prediction. I've edited the last comment of minor changes to the actual one step ahead. In the one step ahead, if our model have the maximum lag=2, we have to enter 3 values for both x_valid and y_valid, in which the first two will be used as initial conditions as follows. Suppose we have a model with maximum lag = 2 and we want to make a one step ahead prediction:

x1[k] = [0.36, -0.85, 0.52] x2[k] = [0.16, -0.03, 0.78] y[k] = [0.50, 0.16, 0.17]

we have to put the 3 values because x1[k-1] = -0.85, x1[k-2] = 0.36 and and the same idea can be applied for the other arrays. So the third value is not used in the prediction as you said correctly. So, in such case, you will have 3 values in the prediction array: the first two are equal to the first two values of y and the last one will be the predicted value based on the lagged x and y values.

dj-gauthier commented 1 year ago

OK. Let me try that out. If I understand correctly, the first element in x1, x2, and y don't matter - the first values of x1 and x2 are not used and hence they can be any value. Similarly, the first value in y will be overwritten with the actual prediction?

I must say that I would never guess this based on any of the documentation. If I can get this works, I will volunteer to make some additional documentation to make this clearer.

Thanks, Dan

On Wed, Oct 4, 2023 at 1:12 PM Wilson Rocha @.***> wrote:

@dj-gauthier https://github.com/dj-gauthier , sorry for the last answer, it was actually wrong because I did a infinity steps ahead prediction. I've edited the last comment of minor changes to the actual one step ahead. In the one step ahead, if our model have the maximum lag=2, we have to enter 3 values for both x_valid and y_valid, in which the first two will be used as initial conditions as follows. Suppose we have a model with maximum lag = 2 and we want to make a one step ahead prediction:

x1[k] = [0.36, -0.85, 0.52] x2[k] = [0.16, -0.03, 0.78] y[k] = [0.50, 0.16, 0.17]

we have to put the 3 values because x1[k-1] = -0.85, x1[k-2] = 0.36 and so on. So the third value is not used in the prediction as you said correctly. So, in such case, you will have 3 values in the prediction array: the first two are equal to the first two values of y and the last one will be the predicted value based on the lagged x and y values.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1747318764, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24VSBGZV7CDD7FEP3HDX5WKGDAVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGMYTQNZWGQ . You are receiving this because you were mentioned.Message ID: @.***>

wilsonrljr commented 1 year ago

In the example above, the values 0.52, 0.78, and 0.17 are not actually used. But you have to pass them as a reference. Let me try to explain using a date related data. Suppose you are using months (jan=1, feb=2, march=3, as input variable in your model and you want to predict some value in March. If the model have the maximum lag equal two, we should pass

x = [1, 2, 3]

so x[k] = march, the month we want to predict, and x[k-1] = 2 (feb) and x[k-2] = 1 (Jan). The regressor matrix will not contain the value 3 (march). So, in that respect, its doesn't matter, as you said. If you change 3 to 100, for instance, there is no effect in the prediction.

And you are completely right. I think its not clear in the docs at all.

Edit:

Its worth to notice that those values doesn't matter only in cases where we want to predict a single value. If you pass like 10 values and want to perform one-step ahead prediction, if you change that value you will have totally different results.

Example:

If we want to execute one step ahead prediction from March to May, x = [1, 2, 3, 4, 5]

to predict March, x[k-1] = 2 (Feb) and x[k-2] = 1 (Jan), the to predict April, x[k-1] = 3 (March) and x[k-2] = 2 (Feb), and so on.

So if you change the 3 to 100, the prediction will be very different in the months that will have March as x[k-1] or x[k-2]

dj-gauthier commented 1 year ago

Dear Wilson,

Based on your input, I think I have everything working correctly with the one-step-ahead prediction. I would not have ever figured this out based on the documentation provided. I have made a few slides to explain this for the specific example I am considering so it is probably not correct for all the general cases for SysIdentPy. I hope you can use this directly or adapt it to the format for the web site. See attached.

I will continue my testing. If all looks good, I will turn back to the ridge regression routines and will try my best to repost to GitHub.

My plan is to complete a study and write this up for publication as a follow on to my recent work on "next-generation reservoir computing." This paper has attracted a lot of attention (over 72k downloads of the paper in the last two years). I believe that the new study, where I will use SysIdentPy, will similarly attract a lot of attention. I will cite your work and I hope you can improve the documentation a bit before my publication (I will add an appendix to the paper to also explain this) and that you can add the option of regularization (once I test and you inspect). That way, people that read my paper will be able to use the regularized routine if they want to adapt or extend what I write up.

D. J. Gauthier, E. Bollt, A. Griffith, W. A. S. Barbosa, ‘Next generation reservoir computing https://u.osu.edu/quantinfo/files/2021/12/NatComm12-5564-2021.pdf,’ Nat. Commun. 12, 5564 https://www.nature.com/articles/s41467-021-25801-2 (2021). (preprint https://arxiv.org/abs/2106.07688) [OSU press release https://news.osu.edu/a-new-way-to-solve-the-hardest-of-the-hard-computer-problems/ and many other news articles https://www.nature.com/articles/s41467-021-25801-2/metrics, Editor Selected for the Nature Communications Focus on AI https://www.nature.com/collections/ceiajcdbeb] Software available: [image: DOI] https://zenodo.org/badge/latestdoi/370742927

Thank you for all your help in figuring this out! I think there is no need for us to meet tomorrow night and I will cancel the meeting.

Best, Dan

On Wed, Oct 4, 2023 at 1:12 PM Wilson Rocha @.***> wrote:

@dj-gauthier https://github.com/dj-gauthier , sorry for the last answer, it was actually wrong because I did a infinity steps ahead prediction. I've edited the last comment of minor changes to the actual one step ahead. In the one step ahead, if our model have the maximum lag=2, we have to enter 3 values for both x_valid and y_valid, in which the first two will be used as initial conditions as follows. Suppose we have a model with maximum lag = 2 and we want to make a one step ahead prediction:

x1[k] = [0.36, -0.85, 0.52] x2[k] = [0.16, -0.03, 0.78] y[k] = [0.50, 0.16, 0.17]

we have to put the 3 values because x1[k-1] = -0.85, x1[k-2] = 0.36 and so on. So the third value is not used in the prediction as you said correctly. So, in such case, you will have 3 values in the prediction array: the first two are equal to the first two values of y and the last one will be the predicted value based on the lagged x and y values.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1747318764, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24VSBGZV7CDD7FEP3HDX5WKGDAVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGMYTQNZWGQ . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

Our emails crossed paths. What I just sent should be consistent with what you just wrote.

On Wed, Oct 4, 2023 at 3:44 PM Wilson Rocha @.***> wrote:

In the example above, the values 0.52, 0.78, and 0.17 are not actually used. But you have to pass them as a reference. Let me try to explain using a date related data. Suppose you are using months (jan=1, feb=2, march=3, as input variable in your model and you want to predict some value in March. If the model have the maximum lag equal two, we should pass

x = [1, 2, 3]

so x[k] = march, the month we want to predict, and x[k-1] = 2 (feb) and x[k-2] = 1 (Jan). The regressor matrix will not contain the value 3 (march). So, in that respect, its doesn't matter, as you said. If you change 3 to 100, for instance, there is no effect in the prediction.

And you are completely right. I think its not clear in the docs at all.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1747529088, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24S2LL6LWCSMWNUDRK3X5W4CNAVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGUZDSMBYHA . You are receiving this because you were mentioned.Message ID: @.***>

wilsonrljr commented 1 year ago

Thats great! I'm very happy that you are considering SysIdentPy in your paper and I'll be glad do help you with anything related to SysIdentPy.

I can't find the attached file (maybe because the mail is connected to the pull request in github, I don't know). Could you send me directly in my email ( wilsonrljr@outlook.com)?

Ok then. Lets cancel our meeting tomorrow, but if you need anything just let me know.

dj-gauthier commented 1 year ago

Dear Wilson,

I just sent to your outlook account any of my emails that have attachments. Dan

On Wed, Oct 4, 2023 at 7:13 PM Wilson Rocha @.***> wrote:

Thats great! I'm very happy that you are considering SysIdentPy in your paper and I'll be glad do help you with anything related to SysIdentPy.

I can't find the attached file (maybe because the mail is connected to the pull request in github, I don't know). Could you send me directly in my email ( @.***)?

Ok then. Lets cancel our meeting tomorrow, but if you need anything just let me know.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1747771945, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24XVNTRK7MUM3IGAKYDX5XUS5AVCNFSM6AAAAAA46X7G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXG43TCOJUGU . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

Dear Wilson,

I have performed several tests to see that the regularization is working. You need to take the ridge_param a bit higher than I am used to when not using the forward regression method. If it is too small, then autonomous prediction is unstable (trajectory goes to infinity). It is easy to check this and I usually change ridge_param by a factor of ten (starting at, say, 1e-6). For a noise-free system (the Lorenz63 attractor), I had to take ridge_param around 1.e-2. You have to do a search (e.g., grid search or a Bayesian method) to get the best performance.

I am a GitHub newbie, but I think that I properly fetched the upstream master branch, rebased, and recommitted my changes to estimators.py and forward_regression_orthogonal_least_squares.py (these are the only two files that needed modifications). You can tell that the modified files are there if you see a new parameter that is passed to these routines called ridge_param. Let me know if I need to do something different. Dan

wilsonrljr commented 1 year ago

Thank you very much, @dj-gauthier. I'm updating the docs with your suggestion as well. I'm preparing for the new release and I'll include your contribution on it. Everything looks fine, I'll take a look in the next days and I let you know if I need anything.

dj-gauthier commented 1 year ago

Wilson, I am traveling now and want to take one more look at what I pushed. Will do this tomorrow and let you know if I see a problem.

Dan

On Sat, Oct 14, 2023 at 12:45 PM Wilson Rocha @.***> wrote:

Thank you very much, @dj-gauthier https://github.com/dj-gauthier. I'm updating the docs with your suggestion as well. I'm preparing for the new release and I'll include your contribution on it. Everything looks fine, I'll take a look in the next days and I let you know if I need anything.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1763062064, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24XBRGQ7FHXXAWB5PZTX7LFSFANCNFSM6AAAAAA46X7G2Q . You are receiving this because you were mentioned.Message ID: @.***>

dj-gauthier commented 1 year ago

@wilsonrljr , I confirm that the version of the code I pushed is the most recent version and runs properly. Thanks, Dan

On Sat, Oct 14, 2023 at 1:46 PM Daniel Gauthier @.***> wrote:

Wilson, I am traveling now and want to take one more look at what I pushed. Will do this tomorrow and let you know if I see a problem.

Dan

On Sat, Oct 14, 2023 at 12:45 PM Wilson Rocha @.***> wrote:

Thank you very much, @dj-gauthier https://github.com/dj-gauthier. I'm updating the docs with your suggestion as well. I'm preparing for the new release and I'll include your contribution on it. Everything looks fine, I'll take a look in the next days and I let you know if I need anything.

— Reply to this email directly, view it on GitHub https://github.com/wilsonrljr/sysidentpy/pull/116#issuecomment-1763062064, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALEY24XBRGQ7FHXXAWB5PZTX7LFSFANCNFSM6AAAAAA46X7G2Q . You are receiving this because you were mentioned.Message ID: @.***>

wilsonjr commented 1 year ago

@dj-gauthier I think you marked the wrong Wilson. @wilsonrljr

dj-gauthier commented 1 year ago

@dj-gauthier I think you marked the wrong Wilson. @wilsonrljr

Sorry! Should be fixed!