thibmonsel / torchdde

Neural network compatible DDEs
https://thibmonsel.github.io/torchdde/
Apache License 2.0
9 stars 0 forks source link

Splitting history and forward trajectory #30

Closed ReHoss closed 3 months ago

ReHoss commented 4 months ago

Consider the common case where t - tau == t0, since ys_interpolation equals to None at the first call of _integrate_dde returns an error. The operator >= at line 205 may be replaced by >. Moreover, according to Wikipedia the history function is defined on $t_0$ included i.e. $[-\tau, t_0]$

https://github.com/thibmonsel/torchdde/blob/afa698899af93bdee53942f6e1aef9bad9ce23ae/torchdde/integrate.py#L194-L208

thibmonsel commented 4 months ago

Taking just Equation (1) from the paper here. The DDE's history function and equation both share $t_0$ so using t-tau >= t0 or t-tau>t0 is essentially the same here (more over the solution is continuous so ys_interpolation or history_func give the same output).

ys_interpolation is equal to None might cause an issue if our integration first step if larger than max(tau) but in practice this is very rare. Have you run into this problem ? (I know that in some edge cases I ran into ys_interpolation=None)

ReHoss commented 4 months ago

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

ts = torch.arange(0, 0.2, 0.05)
idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
    batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
    tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1:]
solver = torchdde.RK4()
def history_func(t: torch.Tensor) -> torch.Tensor:
    return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
    return x

torchdde.integrate(
    forward,
    solver,
    ts_forward[0],
    ts_forward[-1],
    ts_forward,
    history_func,
    None,
    dt0=torch.tensor(0.05),
    delays=torch.nn.Parameter(torch.tensor([0.05])),
    stepsize_controller=controller,
    discretize_then_optimize=True,
)

This returns TypeError: 'NoneType' object is not callable and should be solved by setting a strict inequality such that history_func is always called. The problem is that in that case, ys_interpolation equals None. Note here dt0 == dt is True and equals 0.05.

Same as in the DDETrainer class, I think you want a soft inequality otherwise you shrink too much


ts = tensor([0.0000, 0.0500, 0.1000, 0.1500])
self.model.delays = torch.nn.Parameter([0.05])

idx = (ts > max(self.model.delays) + (ts[1] - ts[0])).nonzero().flatten()[0] + 1
ts_history_train, ts_train = (
    ts[:idx],
    ts[idx - 1 : idx + init_ts_length],
)
ys_history, ys = (
    data[:, :idx],
    data[:, idx - 1 : idx + init_ts_length],
)

this will give you

ts_history_train == {Tensor: (4,)} tensor([0.0000, 0.0500, 0.1000, 0.1500]) ts_train == {Tensor: (1,)} tensor([0.1500])

while I think you want at least ts_train == {Tensor: (2,)} tensor([0.1000, 0.1500])

thibmonsel commented 4 months ago

Adding > should be less error-prone for all situation so i'll change that. However, your error also comes from another source. We can't have tnext - tprev < max(tau). One thing that can be done is to raise an exception if we run into this issue.

If you change to 0.04 the delays and pull the new master code runs well.

NB: This is no more DDETrainer you might be on an old commit.

Regarding your code that splits the data, torchdde is agnostic to this. Its up to the user's discretion to do this correctly.

ReHoss commented 4 months ago

Adding > should be less error-prone for all situation so i'll change that. However, your error also comes from another source. We can't have max(tau)>=dt. One thing that can be done is to raise an exception if we run into this issue.

If you change to 0.04 the delays and pull the new master code runs well.

NB: This is no more DDETrainer you might be on an old commit

why can't we have max(tau)==dt ? In this case I don't want to change the delay....

thibmonsel commented 4 months ago

You're right this shouldn't an issue.

NB: If you change the solver to Euler the code works fine. The error resides in the RK4 solver AND

k4 = func(t + dt, y + dt * k3, args)

the fact that you are on the first step, so ys_interpolation=None and ys_interpolation is not a Callable yet. In this particular case

k4 = func(t + dt, y + dt * k3, args) = ys_interpolation(t-tau) = None(t-tau)

This comes from TorchLinearInterpolator that isn't the "perfect" implementation. One option to fix this start with a dt smaller than min(delays)

ReHoss commented 4 months ago

the fact that you are on the first step, so ys_interpolation=None and ys_interpolation is not a Callable yet. In this particular case

Here I suggest to simply call history_func. Indeed, no bug is observed with Euler (why?).

thibmonsel commented 4 months ago

Please re-read the comment above.

ReHoss commented 4 months ago

Please re-read the comment above.

It does not explicitly state the difference between Euler and RK4. Does Euler not need any interpolation at the very beginning?

ReHoss commented 4 months ago

I encountered another RK4 specific bug. Here dt0 < dt:

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

dt = 0.05
dt0 = 0.05 * 0.8
ts = torch.arange(0, 0.4 + dt, dt)

idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
  batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
  tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1 :]
solver = torchdde.RK4()

def history_func(t: torch.Tensor) -> torch.Tensor:
  return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
  return x

torchdde.integrate(
  forward,
  solver,
  ts_forward[0],
  ts_forward[-1],
  ts_forward,
  history_func,
  None,
  dt0=torch.tensor(dt0),
  delays=torch.nn.Parameter(torch.tensor([dt])),
  stepsize_controller=controller,
  discretize_then_optimize=True,
)

This returns:

{ValueError}ValueError('Interpolation point is outside data range. ie t=0.28999996185302734 > ts[-1]=0.25999999046325684 or t < ts[0]=0.10000000149011612')

Note again that max(delays) == dt

thibmonsel commented 4 months ago

The bug doesn't come from RK4 itself but comes from the interpolators and the DDE stepping combined.

Sidenote, please give all of your python file (ie including imports for easier reproducibility).

There are implicit rules for DDE integration in order that everything rules smoothly :

In your file if you add an assert dt > dt0 should be good

ReHoss commented 4 months ago

The bug doesn't come from RK4 itself but comes from the interpolators and the DDE stepping combined.

Sidenote, please give all of your python file (ie including imports for easier reproducibility).

There are implicit rules for DDE integration in order that everything rules smoothly :

  • the stepsize dt = tnext - tprev < min(delays)

In your file if you add an assert dt > dt0 should be good

As I said, dt0 < dt already and the bug still exists. dt == min(delays) should be supported, as it is a very common case: a unique delay $\tau$ such that $dt == \tau$.

thibmonsel commented 4 months ago

This works


import torchdde 
import torch

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

tau = 0.03
dt0 = 0.05
assert tau > dt0
ts = torch.arange(0, 0.4 + dt0, dt0)

idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
  batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
  tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1 :]
solver = torchdde.RK4()

def history_func(t: torch.Tensor) -> torch.Tensor:
  return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
  return x

torchdde.integrate(
  forward,
  solver,
  ts_forward[0],
  ts_forward[-1],
  ts_forward,
  history_func,
  None,
  dt0=torch.tensor(dt0),
  delays=torch.nn.Parameter(torch.tensor([tau])),
  stepsize_controller=controller,
  discretize_then_optimize=True,
)
ReHoss commented 4 months ago

Here you are not solving the bug but rather changing the values. As I said $\tau = dt$ should work as It is a very simple and default case.

ReHoss commented 4 months ago

This works

import torchdde 
import torch

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

tau = 0.03
dt0 = 0.05
assert tau > dt0
ts = torch.arange(0, 0.4 + dt0, dt0)

idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
  batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
  tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1 :]
solver = torchdde.RK4()

def history_func(t: torch.Tensor) -> torch.Tensor:
  return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
  return x

torchdde.integrate(
  forward,
  solver,
  ts_forward[0],
  ts_forward[-1],
  ts_forward,
  history_func,
  None,
  dt0=torch.tensor(dt0),
  delays=torch.nn.Parameter(torch.tensor([tau])),
  stepsize_controller=controller,
  discretize_then_optimize=True,
)

The assertion returns false here. Btw, what is $dt_0$ concretely for Euler and RK4?

Does $dt_0$ fixes the value $dt$ of the Euler integrator?

thibmonsel commented 4 months ago

Last intermediate stage of RK4 fails because

k4 = func(t + dt, y + dt * k3, args)

and Euler is simple

y_new = y + dt * f(t,y,args)

Evaluating f at time t-tau is ok for Euler but in the case of RK4 for k4 it doesn't know how to interpolate

thibmonsel commented 4 months ago

For the assertion, i put tau=0.08 sorry. And dt0 is the initial stepsize for integration used. For constant stepsize solver it remains the same. ( This is a documentation associated to the package) Regarding the case $\tau = dt$, it isn't a common case in normal scenarios $\tau >> dt$

ReHoss commented 4 months ago

For the assertion, i put tau=0.08 sorry. And dt0 is the initial stepsize for integration used. For constant stepsize solver it remains the same. Regarding the case τ=dt, it isn't a common case in normal scenarios τ>>dt

The case $τ=dt$, is the classic simple case where both the instantaneous and the previous observation are considered. It is a very standard case that I naturally wanted to treat.

Cases like $\tau = 1.2 dt$ also break. IMO, this part of the code needs to be robustified so that at least things do not break until $dt$; here it is unclear where the threshold where RK4 breaks is attained, which implies unexpected errors.

thibmonsel commented 4 months ago

In order to fix this TorchLinearInterpolator needs to be reworked. This is not high on my priority list. If testing cases $\tau \approx dt$ are just for exploration purposes then I would suggest that doing a PR for TorchLinearInterpolator. Otherwise, start with supported case of $\tau >> dt$.

Self-note : the issue here is that the points aren't added correctly with the first step if TorchLinearInterpolator is not None.

thibmonsel commented 3 months ago

Working on this, in the current branch https://github.com/thibmonsel/torchdde/tree/dense_interp. The issue was that the condition t-tau <= t0 was failing when t-tau and t0 were arbitrarily close.


def ode_func(
        t: Float[torch.Tensor, ""],
        y: Float[torch.Tensor, "batch ..."],
        args: Any,
    ):
        # applies the function func to the current
        # time t and state y and the history
        # we have to make sur that t - tau > dt
        # otherwise we are making a prediction with
        # an unknown ys_interpolation ...
        def cond(t, tau):
            in_history_domain = (t - tau).item() <= t0
            very_close_to_t0 = torch.allclose(t - tau, t0)
            return in_history_domain or very_close_to_t0

        history = [
            history_func(t - tau) if cond(t, tau) else ys_interpolation(t - tau)  # type: ignore
            for tau in delays
        ]
        return func(t, y, args, history=history)

If you clone this branch and re-run your first example it should work !

ReHoss commented 3 months ago
     history = [ 
         (ys_interpolation(t - tau) if t - tau > t0 else history_func(t - tau))  # type: ignore 
         for tau in delays 
     ] 

IMO the above (strict inequality) solved the problem (calling history_func at $t_0$ instead of calling ys_interpolation); why not sticking to that? @thibmonsel

EDIT: it looks like the new cond function implements that. However, why the very_close_to_t0 is needed?

thibmonsel commented 3 months ago

At t0 both history_func and ys_interpolation are the same, either way evaluating one of the other won't change much. The issue is that t-tau<t0 might be True or not up to float precision. (i.e $t-\tau = 0^+$ or $0^-$)

The issue was that the condition t-tau <= t0 was failing when t-tau and t0 were arbitrarily close.

ReHoss commented 3 months ago

This returns TypeError: 'NoneType' object is not callable and should be solved by setting a strict inequality such that history_func is always called. The problem is that in that case, ys_interpolation equals None. Note here dt0 == dt is True and equals 0.05.

As I said here, there was a critical flaw when ys_interpolation is called while not existing yet (being None). This happened at the very first iteration because at $t_0$, the interpolator ys_interpolation was called, but inexistent while calling history_func was enough to overcome this bug.

thibmonsel commented 3 months ago

I have no such error with your script, ie the following one, you sure to be on the specific branch mentionned previously.


import torch 
import torchdde

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

ts = torch.arange(0, 0.2, 0.05)
idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
    batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
    tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1:]
solver = torchdde.RK4()
def history_func(t: torch.Tensor) -> torch.Tensor:
    return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
    return x

torchdde.integrate(
    forward,
    solver,
    ts_forward[0],
    ts_forward[-1],
    ts_forward,
    history_func,
    None,
    dt0=torch.tensor(0.05),
    delays=torch.nn.Parameter(torch.tensor([0.05])),
    stepsize_controller=controller,
    discretize_then_optimize=True,
)
ReHoss commented 3 months ago

I have no such error with your script, ie the following one, you sure to be on the specific branch mentionned previously.

import torch 
import torchdde

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

ts = torch.arange(0, 0.2, 0.05)
idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
    batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
    tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1:]
solver = torchdde.RK4()
def history_func(t: torch.Tensor) -> torch.Tensor:
    return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
    return x

torchdde.integrate(
    forward,
    solver,
    ts_forward[0],
    ts_forward[-1],
    ts_forward,
    history_func,
    None,
    dt0=torch.tensor(0.05),
    delays=torch.nn.Parameter(torch.tensor([0.05])),
    stepsize_controller=controller,
    discretize_then_optimize=True,
)

Let me find the commit where this error happened @thibmonsel .

image

Checkout at 1ad2c12ffb0592ea54c3873c8796ae0bebf4a0da (make sure you are running this version of the code), and you will observe the error which is due to ys_interpolation = None at line 259.

The change >= to > solved the issue: this was the main purpose of this issue.

ReHoss commented 3 months ago

I encountered another RK4 specific bug. Here dt0 < dt:

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

dt = 0.05
dt0 = 0.05 * 0.8
ts = torch.arange(0, 0.4 + dt, dt)

idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
  batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
  tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1 :]
solver = torchdde.RK4()

def history_func(t: torch.Tensor) -> torch.Tensor:
  return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
  return x

torchdde.integrate(
  forward,
  solver,
  ts_forward[0],
  ts_forward[-1],
  ts_forward,
  history_func,
  None,
  dt0=torch.tensor(dt0),
  delays=torch.nn.Parameter(torch.tensor([dt])),
  stepsize_controller=controller,
  discretize_then_optimize=True,
)

This returns:

{ValueError}ValueError('Interpolation point is outside data range. ie t=0.28999996185302734 > ts[-1]=0.25999999046325684 or t < ts[0]=0.10000000149011612')

Note again that max(delays) == dt

This is the next important question IMO.

thibmonsel commented 3 months ago

I'm talking about the dense_interp branch, and changing >= to > won't fix the issue completely. FYI, everything is fixed in the branch.

ReHoss commented 3 months ago

I'm talking about the dense_interp branch, and changing >= to > won't fix the issue completely.

At least, the issue None which comes when $t - \tau == t_0$ at the very first iteration is solved.

ReHoss commented 3 months ago

I'm talking about the dense_interp branch, and changing >= to > won't fix the issue completely. FYI, everything is fixed in the branch.

Are you talking about the issue with the interpolation and RK4?

ReHoss commented 3 months ago

I encountered another RK4 specific bug. Here dt0 < dt:

controller = torchdde.step_size_controller.constant.ConstantStepSizeController()

dt = 0.05
dt0 = 0.05 * 0.8
ts = torch.arange(0, 0.4 + dt, dt)

idx_truncation = 3

batch_size = 7
tensor_time_points_history = ts[:idx_truncation]
dim_state = 4
tensor_trajectory_history = torch.zeros(
  batch_size, len(tensor_time_points_history), dim_state
)

history_interpolator = torchdde.TorchLinearInterpolator(
  tensor_time_points_history, tensor_trajectory_history
)

ts_forward = ts[idx_truncation - 1 :]
solver = torchdde.RK4()

def history_func(t: torch.Tensor) -> torch.Tensor:
  return history_interpolator(t)

def forward(t: torch.Tensor, x: torch.Tensor, args, history) -> torch.Tensor:
  return x

torchdde.integrate(
  forward,
  solver,
  ts_forward[0],
  ts_forward[-1],
  ts_forward,
  history_func,
  None,
  dt0=torch.tensor(dt0),
  delays=torch.nn.Parameter(torch.tensor([dt])),
  stepsize_controller=controller,
  discretize_then_optimize=True,
)

This returns:

{ValueError}ValueError('Interpolation point is outside data range. ie t=0.28999996185302734 > ts[-1]=0.25999999046325684 or t < ts[0]=0.10000000149011612')

Note again that max(delays) == dt

I am happy to note this script does not return any error anymore on branch dense_interp.

thibmonsel commented 3 months ago

FYI, this issue will be closed when the branch is merged in master.