Regularization: structure preserving neural network

bhattah commented 3 years ago

Hi Lu Lu,

First of all, thanks for your wonderful contribution.

I want to modify Burgers.py to have it preserve the quantity: \int y(x,t)^2 dx = \int y(x,0)^2 dx

This can either be achieved by updating the pde function to look something like this:

def pde(x, y):
    dy_x = dde.grad.jacobian(y, x, i=0, j=0)
    dy_t = dde.grad.jacobian(y, x, i=0, j=1)
    dy_xx = dde.grad.hessian(y, x, i=0, j=0)
    x, t = x[:, :-1], x[:, -1:]
    y0 = dde.backend.tf.keras.backend.exp(-x**2)
    return [dy_t + y * dy_x - 0.01 / np.pi * dy_xx, dde.backend.tf.keras.backend.sum(y**2) -dde.backend.tf.keras.backend.sum(y0**2)]

But this produces

ValueError: Index out of range using dim 0; ...

inside model.compile. I believe this is because the shape of the tensor dde.backend.tf.keras.backend.sum(y**2) is shape(). I am stuck at this point.

One could also achieve the preservation by updating the existing loss function to also include the norm of the residual in preservation of the quantity. To this end, I update the default MSE function like so:
```
def mean_squared_error(y_true, y_pred):
return tf.reduce_mean(tf.math.square(y_true -y_pred)) +tf.abs(tf.keras.backend.sum(y_true**2 - y_pred**2))
```
This runs alright, but doesn't give the quantity preservation in y_pred as measured by sum(y_pred**2). Does this approach sound reasonable to you?

Ideally

I would like to impose the the mass conservation at the "continuous level", replacing integration by "automatic integration".
Have it working both ways mentioned above.

Thanks for your valuable inputs.

lululxvi commented 3 years ago

What you need is for any t, \int y(x,t)^2 dx = C. Here, C = \int y(x,0)^2 dx, and can be computed beforehand. The main issue in your approach is that the training points are distributed in the whole x-t domain, so what you compute is \iint y(x,t)^2 dxdt.

There are a few ways to implement what you need. It is possible to implement by modifying your approaches. But the cleanest way might be using OperatorBC as follows. Let us say we need the mass conservation for the time t0, i.e., \int y(x,t0)^2 dx = C.

Define OperatorBC to compute y^2 for t0
- on_boundary should return true if x[1] == t0, i.e., we only care about the training points on t0.
- Define func(inputs, outputs, X) to return outputs ** 2
Define a loss for this OperatorBC as
```
def mass_conservation(y_true, y_pred):
return L * 1/N *  sum(y_pred) - C
```
Here, y_pred is the outputs ** 2 computed in OperatorBC. Note: \int y(x,t0)^2 dx = L 1/N (y_1^2 + y_2^2 + ... + y_N^2).
Use different losses for the PDE and the OperatorBC we defined. Assume we only have the PDE loss the the OperatorBC as dde.data.PDE(geom, pde, operatorbc, ...). Then we use
```
model.compile(..., loss=['MSE', mass_conservation])
```
So we still MSE loss for the PDE residual, and we use mass_conservation for the operatorbc. If you also have a third BC, like dde.data.PDE(geom, pde, [otherbc, operatorbc], ...), then use loss=['MSE', 'MSE', mass_conservation].
Use grid points for training as dde.data.PDE(..., train_distribution='uniform'), so that we have equispaced points instead of random train points. Check the generated train points and pick a valid t0 which is in the train points.

If you want to define the mass conservation for multiple t0, just repeat this.

bhattah commented 3 years ago

Thanks, Lu Lu! Here is how I replicate your suggestion:

Define the new condition using OperatorBC:


def func(inputs, outputs):
return outputs ** 2

def boundary_C(x, on_boundary): return on_boundary and x[1] == 0

oc = dde.OperatorBC(geomtime, lambda x, y, _: func(x,y), boundary_C)

2. Define the conservation law
```python
def C_conservation(y_true, y_pred):
    return (sum(y_pred) - C[0])*dx

where C[0] and dx are known.

Create PDE:

data = dde.data.TimePDE(
geomtime, pde, [bc, ic, oc], train_distribution='uniform', num_domain=2540*2, num_boundary=80*2, num_initial=160*2
)

Finally

model.compile("adam", lr=1e-3)
model.train(epochs=2000)
model.compile("L-BFGS-B", loss=['MSE','MSE','MSE',C_conservation])

This last line gives me the error that deepxde/losses.py is not meant to work on a list. To get around this, I try to iterate over the loss list like so:

loss = [losses_module.get(l) for l in loss]
temp = []
for i in range(len(loss)):
   temp.append(self.data.losses(self.net.targets, self.net.outputs, loss[i], self)[i])
self.losses = tf.stack(temp)

This however produces the error that you are not supposed to iterate over a tensor in graph execution. The suggested remedy to decorate the offending function inside deepxde/data/pde.py also proves futile.

@tf.function
def losses(self, targets, outputs, loss, model):

giving an error inside deepxde/gradients.py:

deepxde/gradients.py:38 __call__  *
    return self.J[i] if j is None or self.dim_x == 1 else self.J[i][:, j : j + 1]

TypeError: 'NoneType' object is not subscriptable

Trying to further circumvent the issue by instead modifying deepxde/data/pde.py by subcripting the loss

def losses_train():
        f_train = f
        if self.pde is not None and get_num_args(self.pde) == 3:
            f_train = self.pde(model.net.inputs, outputs, self.train_x)
            if not isinstance(f_train, (list, tuple)):
                f_train = [f_train]

        bcs_start = np.cumsum([0] + self.num_bcs)
        error_f = [fi[bcs_start[-1] :] for fi in f_train]
        losses = [
            loss[0](tf.zeros(tf.shape(error), dtype=config.real(tf)), error)
            for error in error_f
        ]
        for i, bc in enumerate(self.bcs):
            beg, end = bcs_start[i], bcs_start[i + 1]
            error = bc.error(self.train_x, model.net.inputs, outputs, beg, end)
            losses.append(
                loss[i+1](tf.zeros(tf.shape(error), dtype=config.real(tf)), error)
            )
        return losses

    def losses_test():
        f_test = f
        if self.pde is not None and get_num_args(self.pde) == 3:
            f_test = self.pde(model.net.inputs, outputs, self.test_x)
            if not isinstance(f_test, (list, tuple)):
                f_test = [f_test]
        return [
            loss[0](tf.zeros(tf.shape(fi), dtype=config.real(tf)), fi) for fi in f_test
        ] + [tf.constant(0, dtype=config.real(tf)) for _ in self.bcs]

also results in OperatorNotAllowedInGraphError: iterating over ``tf.Tensor`` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function. I am nonplussed at this point.

lululxvi commented 3 years ago

Use tf.math.reduce_sum in C_conservation
DeepXDE supports the loss as a list loss=['MSE','MSE','MSE',C_conservation]. But you need to clone the updated code from Github directly, not from pip or conda.

bhattah commented 3 years ago

Alright I did what you suggested, and the program works. I still don't get any improvement on the OperatorBC over multiple epochs. Here is the main file:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np

import sys
sys.path.append('/home/bhattah/Documents/deepxde/')

import deepxde as dde
from deepxde.backend import tf

import scipy.io

def gen_testdata():
    data = scipy.io.loadmat("dataset/MBE.mat")
    t, x, exact = data["t"], data["x"], data["u1"].T
    xx, tt = np.meshgrid(x, t)
    X = np.vstack((np.ravel(xx), np.ravel(tt))).T
    y = exact.flatten()[:, None]
    return X, y

#def main():

def pde(x, y):
    dy_x = dde.grad.jacobian(y, x, i=0, j=0)
    dy_t = dde.grad.jacobian(y, x, i=0, j=1)
    dy_xx = dde.grad.hessian(y, x, i=0, j=0)
    return dy_t + 3/4*y * dy_x +0.1*dy_x +1e-5 * dy_xx +0*2*0.01*y

X, y_true = gen_testdata()
C = np.sum(np.reshape(y_true,(556,100))**2, axis=1)
dx = 0.0808
error = C -C[0]
print("max error in y_true l2 norm:", max(abs(error)))

def C_conservation(y_true, y_pred):
    return (tf.math.reduce_sum(y_pred) - C[0])*dx

def func(inputs, outputs):
    return outputs ** 2

def boundary_C(x, on_boundary):
    return on_boundary and x[1] == 0

geom = dde.geometry.Interval(-4, 4)
timedomain = dde.geometry.TimeDomain(0, 4.995)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)

bc = dde.PeriodicBC(geomtime, 0, lambda _, on_boundary: on_boundary)
oc = dde.OperatorBC(geomtime, lambda x, y, _: func(x,y), boundary_C)
ic = dde.IC(
    geomtime, lambda x: np.exp(-(x[:,0:1]/0.5)**2/2)/(0.5*np.sqrt(2*np.pi)), lambda _, on_initial: on_initial
)

data = dde.data.TimePDE(
    geomtime, pde, [bc, ic, oc], train_distribution='uniform', num_domain=2540*2, num_boundary=80*2, num_initial=160*2
)
net = dde.maps.FNN([2] + [20] * 3 + [1], "tanh", "Glorot normal")

#%%
model = dde.Model(data, net)
model.compile("adam", lr=1e-3, loss=['MSE','MSE','MSE',C_conservation])
model.train(epochs=15000)

model.compile("L-BFGS-B", loss=['MSE','MSE','MSE',C_conservation])
losshistory, train_state = model.train()
dde.saveplot(losshistory, train_state, issave=True, isplot=True)

y_pred = model.predict(X)
f = model.predict(X, operator=pde)
print("Mean residual:", np.mean(np.absolute(f)))
print("L2 relative error:", dde.metrics.l2_relative_error(y_true, y_pred))
np.savetxt("test.dat", np.hstack((X, y_true, y_pred)))

error = np.sum(np.reshape(y_pred,(556,100))**2, axis=1)
error = error -error[0]
print("max error in y_pred l2 norm:", max(abs(error)))

# if __name__ == "__main__":
#     main()

And here is part of the output:

max error in y_true l2 norm: 3.552713678800501e-14
Warning: 5080 points required, but 5096 points sampled.

Step      Train loss                   Test loss                                                         Test metric
13000     [5.46e-04, 9.39e-08, 4.44e-05, -5.64e-01]   [5.46e-04, 0.00e+00, 0.00e+00, 0.00e+00]    []  
14000     [5.36e-04, 1.17e-06, 4.57e-05, -5.64e-01]   [5.36e-04, 0.00e+00, 0.00e+00, 0.00e+00]    []  
15000     [5.30e-04, 1.06e-07, 4.37e-05, -5.64e-01]   [5.30e-04, 0.00e+00, 0.00e+00, 0.00e+00]    []  

Mean residual: 0.009654951
L2 relative error: 0.3822951114566908
max error in y_pred l2 norm: 1.9193263

As you can see all the errors have gone down except C_conservation. I have a hunch that this is because

deepxde/data/pde.py:

def losses_test():
            f_test = f
            if self.pde is not None and get_num_args(self.pde) == 3:
                f_test = self.pde(model.net.inputs, outputs, self.test_x)
                if not isinstance(f_test, (list, tuple)):
                    f_test = [f_test]
            return [
                loss[i](tf.zeros(tf.shape(fi), dtype=config.real(tf)), fi)
                for i, fi in enumerate(f_test)
            ] + [tf.constant(0, dtype=config.real(tf)) for _ in self.bcs]

doesn't account for the error in self.bcs. Am I correct? Any general directions on how to get started on fixing this will be much appreciated.

lululxvi commented 3 years ago

Something strange in your code:

in boundary_C, x[1] == 0 means t=0, so you are forcing this at the initial condition?
in C_conservation, (tf.math.reduce_sum(y_pred) - C[0])*dx doesn't seem to be correct. For example, if I double the training points, then tf.math.reduce_sum(y_pred) will be doubled, which is not correct. The mass conservation should not depend on the number of training points.

bhattah commented 3 years ago

No, i want to force it at every time step. So I amend the code:


def gen_testdata():
data = scipy.io.loadmat("dataset/MBE.mat")
t, x, exact = data["t"], data["x"], data["u1"].T # t.size = 556x1, x.size = 100x1

def C_conservation(y_true, y_pred): return tf.math.abs(tf.math.reduce_sum(y_pred)dx - C[0]dx)

def func(inputs, outputs): return outputs ** 2

def boundary_C(x, on_boundary): return (x[1] in t) # return True

- If one doubles the points, `dx` will halve. `dx` is an estimation of `L/N`, where `L=8`.
```python
data = dde.data.TimePDE(
    geomtime, pde, [bc, ic, oc], train_distribution='uniform', num_domain=555*98, num_boundary=555*2, num_initial=100
)

I choose num_* to match the imported x-t discretization.

The two steps above do result in a improved training losses in C_conservation but don't give the desired structure preservation: Training model...

Step Train loss Test loss Test metric 0 [2.30e-02, 7.74e-01, 1.98e-01, 5.64e-01] [2.30e-02, 0.00e+00, 0.00e+00, 0.00e+00] []
1000 [1.79e-03, 5.67e-04, 1.66e-02, 3.39e-03] [1.79e-03, 0.00e+00, 0.00e+00, 0.00e+00] []
2000 [1.62e-03, 7.02e-05, 1.17e-03, 1.40e-02] [1.62e-03, 0.00e+00, 0.00e+00, 0.00e+00] []
3000 [1.44e-03, 5.12e-05, 3.49e-04, 3.99e-03] [1.44e-03, 0.00e+00, 0.00e+00, 0.00e+00] []

Mean residual: 0.023490947 L2 relative error: 0.44529396041245933 max error in y_pred l2 norm: 2.4722562

The best place to figure out dx may be C_conservation by doing N = y_pred.shape but I figured the hard ways that this is not so straightforward because y_pred happens to be Tensor. Debugging and tf.printing also don't reveal the tensor size. Being able to do it like this also has the added advantage that one is not restricted to uniform distribution then.

lululxvi commented 3 years ago

As I mentioned above, one OperatorBC is only for one constraint, i.e., the mass conservation at one time instance. In your code tf.math.abs(tf.math.reduce_sum(y_pred)*dx - C[0]*dx), y_pred is the y values for all points whose x[1] in t is true, i.e., all the grid points, so here all the grid points are messed together, which is not what you need.
Although it is possible to use one OperatorBC for multiple constraints, I suggest you to try one specific time instance first to debug your code. For example, you may use t=0, which is the initial condition; ideally the mass conservation at t=0 must be satisfied, if the initial condition is well trained. After this case is passed, you can try another time instance, e.g., the end time. Then you can consider more time instances.
train_distribution='uniform', num_domain=555*98 may not give you a mesh with 555x98. There is no difference between the code a=1*10 and a=2*5... You need to check the train points by looking at the train.dat.

bhattah commented 3 years ago

I have finally figured out the correct implementation:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np

import sys
sys.path.append('/home/bhattah/Documents/deepxde/')

import deepxde as dde
from deepxde.backend import tf
from statistics import mode

import scipy.io

tsize = 100;
def gen_testdata():
    data = scipy.io.loadmat("dataset/MBE.mat")
    t, x, exact = data["t"], data["x"], data["u1"].T
    t = t[:tsize]
    exact = exact[:tsize,:]
    xx, tt = np.meshgrid(x, t)
    X = np.vstack((np.ravel(xx), np.ravel(tt))).T
    y = exact.flatten()[:, None]
    return X, y, t

def pde(x, y):
    dy_x = dde.grad.jacobian(y, x, i=0, j=0)
    dy_t = dde.grad.jacobian(y, x, i=0, j=1)
    dy_xx = dde.grad.hessian(y, x, i=0, j=0)
    return dy_t + 3/4*y * dy_x +0.1*dy_x +1e-5 * dy_xx +0*2*0.01*y

X, y_true, t = gen_testdata()
xsize = int(len(y_true)/tsize)

#%%
dx = 0.0808
C = np.sum(np.reshape(y_true,(tsize,xsize))**2, axis=1)*dx
print("max error in y_true l2 norm:", max(abs(C -C[0])))

myX = np.loadtxt("train.dat",skiprows=1) # get X from previous run of the program because data.train_x depends on oc in the current implementation
myT = np.unique(myX[:,1])
myTpts = np.arange(len(myT))
dxi = [mode(np.diff(np.sort(np.unique(
    myX[np.isclose(myX[:,1],myT[i]),0])))) for i in myTpts]
myTpts = myTpts[(dxi!=np.ones((1,len(dxi)))*8)[0]] # these points are needed during training as myTpts don't exactly match up with t
#myTpts = [] # uncomment this to run the code without enforcing C_conservation

def C_conservation(i):
    return lambda y_true, y_pred: tf.math.abs(
        tf.math.reduce_sum(y_pred)*dxi[i] - C[0])

def func(inputs, outputs):
    return outputs ** 2

def boundary_C(i):
    return lambda x, on_boundary: np.isclose(x[1], myT[i])

geom = dde.geometry.Interval(-4, 4)
timedomain = dde.geometry.TimeDomain(0, np.ndarray.tolist(t[-1])[0]) # is it possible to pass a user-defined time domain as a list of time points?
geomtime = dde.geometry.GeometryXTime(geom, timedomain)

bc = dde.PeriodicBC(geomtime, 0, lambda _, on_boundary: on_boundary)
oc = [dde.OperatorBC(
    geomtime, lambda x, y, _: func(x,y), boundary_C(i)) for i in myTpts]
#%%
ic = dde.IC(
    geomtime, lambda x: np.exp(-(x[:,0:1]/0.5)**2/2)/(0.5*np.sqrt(2*np.pi)), lambda _, on_initial: on_initial
)
bcs = [bc, ic]
bcs.extend(oc)

data = dde.data.TimePDE(
    geomtime, pde, bcs, train_distribution='uniform', num_domain=(tsize)*25, num_boundary=(tsize)*2, num_initial=100
)
net = dde.maps.FNN([2] + [20] * 3 + [1], "tanh", "Glorot normal")

model = dde.Model(data, net)
loss_list = ['MSE']*3 +[C_conservation(i) for i in myTpts]
model.compile("adam", lr=1e-5, loss=loss_list)
model.train(epochs=10000)

model.compile("L-BFGS-B", loss=loss_list)
losshistory, train_state = model.train()
dde.saveplot(losshistory, train_state, issave=True, isplot=True)

y_pred = model.predict(X)
f = model.predict(X, operator=pde)
print("Mean residual:", np.mean(np.absolute(f)))
print("L2 relative error:", dde.metrics.l2_relative_error(y_true, y_pred))
np.savetxt("test.dat", np.hstack((X, y_true, y_pred)))

error = abs(np.sum(np.reshape(y_pred,(tsize,xsize))**2,axis=1)*dx -C[0])
print("max error in y_pred l2 norm:", max(error))

When I run the above code, I get that X is uniform only intermittently in the sense that it leaves many time points with only boundary points without any domain points. This has the uninteded effect that dx becomes 8 (length of x-domain) and the training ruins the waveform trying to enforce the C_conservation. Also dx is not constant at a particular time, although it is almost-constant. That's why I take the first statistical mode of the entire np.diff array.
As commented in the code, getting myTpts from data.train_x instead is too late because data depends on bcs, which in turn depend on data.train_x.
I am getting a rather good agreement when boundary_C is not enforced (although not exact preservation), but the situation worsens when I enforce them. See the two images below from the code given above. The left is with oc and the right is without oc. This is counterintuitive as enforcing structure preservation should improve the error.
here is a link to the mat file: https://drive.matlab.com/sharing/6758a3a6-9225-4d9e-88fa-a01209ad999e

lululxvi commented 3 years ago

Do you check that after training, how well is the structure preservation at each time?

bhattah commented 3 years ago

This is the loss data with structure-preservation: step, loss_train, loss_test, metrics_test 0.000000000000000000e+00 5.825208872556686401e-02 5.306075215339660645e-01 1.445726454257965088e-01 3.994160890579223633e-02 2.288919687271118164e-02 2.654266357421875000e-02 3.258389234542846680e-02 4.094284772872924805e-02 5.152374505996704102e-02 6.420844793319702148e-02 7.885867357254028320e-02 9.532010555267333984e-02 1.134258508682250977e-01 1.330001354217529297e-01 1.538616418838500977e-01 1.758272647857666016e-01 1.987152099609375000e-01 2.223475575447082520e-01 2.465529441833496094e-01 2.711680531501770020e-01 3.123499155044555664e-01 5.825208872556686401e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

1.000000000000000000e+03 2.414443157613277435e-02 4.969793558120727539e-01 1.524859517812728882e-01 3.058594465255737305e-02 9.048104286193847656e-03 5.494117736816406250e-03 2.486526966094970703e-03 2.634525299072265625e-05 1.891613006591796875e-03 3.278613090515136719e-03 4.151403903961181641e-03 4.532098770141601562e-03 4.447400569915771484e-03 3.928124904632568359e-03 3.008365631103515625e-03 1.725256443023681641e-03 1.172423362731933594e-04 1.775205135345458984e-03 3.911256790161132812e-03 6.250023841857910156e-03 2.176547050476074219e-02 2.414443157613277435e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

2.000000000000000000e+03 1.569516025483608246e-02 4.463511407375335693e-01 1.426495760679244995e-01 1.622855663299560547e-02 4.336833953857421875e-04 2.538561820983886719e-04 1.161694526672363281e-04 1.794099807739257812e-05 4.440546035766601562e-05 7.480382919311523438e-05 7.796287536621093750e-05 5.894899368286132812e-05 2.354383468627929688e-05 2.151727676391601562e-05 6.943941116333007812e-05 1.124143600463867188e-04 1.422166824340820312e-04 1.499652862548828125e-04 1.264810562133789062e-04 6.222724914550781250e-05 1.191216707229614258e-02 1.569516025483608246e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

3.000000000000000000e+03 1.561111211776733398e-02 4.222380518913269043e-01 1.407501548528671265e-01 1.420944929122924805e-02 4.861950874328613281e-04 4.112124443054199219e-04 3.350973129272460938e-04 2.603530883789062500e-04 1.888275146484375000e-04 1.230835914611816406e-04 6.580352783203125000e-05 1.978874206542968750e-05 1.156330108642578125e-05 2.443790435791015625e-05 1.430511474609375000e-05 2.378225326538085938e-05 9.518861770629882812e-05 2.061724662780761719e-04 3.634691238403320312e-04 5.738139152526855469e-04 1.067823171615600586e-02 1.561111211776733398e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

4.000000000000000000e+03 1.600360870361328125e-02 3.963109254837036133e-01 1.394379436969757080e-01 1.357644796371459961e-02 2.179145812988281250e-04 1.132488250732421875e-04 1.770257949829101562e-05 6.818771362304687500e-05 1.435279846191406250e-04 2.070665359497070312e-04 2.577900886535644531e-04 2.939105033874511719e-04 3.129839897155761719e-04 3.122687339782714844e-04 2.883076667785644531e-04 2.368092536926269531e-04 1.529455184936523438e-04 3.117322921752929688e-05 1.346468925476074219e-04 3.511309623718261719e-04 1.044476032257080078e-02 1.600360870361328125e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

5.000000000000000000e+03 1.608829572796821594e-02 3.678171634674072266e-01 1.376901119947433472e-01 1.203668117523193359e-02 7.088780403137207031e-04 5.221962928771972656e-04 3.606677055358886719e-04 2.235770225524902344e-04 1.104474067687988281e-04 2.056360244750976562e-05 4.601478576660156250e-05 8.887052536010742188e-05 1.072287559509277344e-04 9.953975677490234375e-05 6.341934204101562500e-05 4.291534423828125000e-06 1.072883605957031250e-04 2.503395080566406250e-04 4.386901855468750000e-04 6.783604621887207031e-04 9.605467319488525391e-03 1.608829572796821594e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

6.000000000000000000e+03 1.592624932527542114e-02 3.354252278804779053e-01 1.356321722269058228e-01 1.029741764068603516e-02 1.216948032379150391e-03 9.072422981262207031e-04 6.422996520996093750e-04 4.199147224426269531e-04 2.381801605224609375e-04 9.518861770629882812e-05 1.066923141479492188e-05 8.046627044677734375e-05 1.147389411926269531e-04 1.128911972045898438e-04 7.444620132446289062e-05 2.861022949218750000e-06 1.217126846313476562e-04 2.857446670532226562e-04 4.990696907043457031e-04 7.669329643249511719e-04 8.949995040893554688e-03 1.592624932527542114e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

7.000000000000000000e+03 1.540074497461318970e-02 2.947638034820556641e-01 1.329382658004760742e-01 8.029818534851074219e-03 1.915514469146728516e-03 1.437604427337646484e-03 1.034021377563476562e-03 7.015466690063476562e-04 4.363656044006347656e-04 2.348423004150390625e-04 9.340047836303710938e-05 9.298324584960937500e-06 2.020597457885742188e-05 3.337860107421875000e-06 7.897615432739257812e-05 2.067089080810546875e-04 3.874301910400390625e-04 6.229281425476074219e-04 9.158253669738769531e-04 1.269459724426269531e-03 7.702112197875976562e-03 1.540074497461318970e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

8.000000000000000000e+03 1.438355725258588791e-02 2.484254390001296997e-01 1.292416155338287354e-01 5.542397499084472656e-03 2.539098262786865234e-03 1.778006553649902344e-03 1.136720180511474609e-03 6.101727485656738281e-04 1.928210258483886719e-04 1.208186149597167969e-04 3.367066383361816406e-04 4.604458808898925781e-04 4.972815513610839844e-04 4.518032073974609375e-04 3.277063369750976562e-04 1.280307769775390625e-04 1.453161239624023438e-04 4.912018775939941406e-04 9.090900421142578125e-04 1.399993896484375000e-03 6.731808185577392578e-03 1.438355725258588791e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

9.000000000000000000e+03 1.292293239384889603e-02 1.962425708770751953e-01 1.241209059953689575e-01 2.185761928558349609e-03 3.796458244323730469e-03 2.657234668731689453e-03 1.695454120635986328e-03 9.046792984008789062e-04 2.778768539428710938e-04 1.932382583618164062e-04 5.173683166503906250e-04 7.032752037048339844e-04 7.593631744384765625e-04 6.940960884094238281e-04 5.146861076354980469e-04 2.282261848449707031e-04 1.595616340637207031e-04 6.437301635742187500e-04 1.220226287841796875e-03 1.885890960693359375e-03 5.366206169128417969e-03 1.292293239384889603e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

1.000000000000000000e+04 1.127638388425111771e-02 1.454553753137588501e-01 1.183862909674644470e-01 1.329183578491210938e-04 4.302144050598144531e-03 2.923488616943359375e-03 1.774430274963378906e-03 8.465051651000976562e-04 1.302361488342285156e-04 3.854632377624511719e-04 7.124543190002441406e-04 8.627176284790039062e-04 8.484721183776855469e-04 6.816983222961425781e-04 3.733634948730468750e-04 6.568431854248046875e-05 6.258487701416015625e-04 1.298725605010986328e-03 2.076506614685058594e-03 2.952694892883300781e-03 3.502547740936279297e-03 1.127638388425111771e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

1.002600000000000000e+04 1.127671357244253159e-02 1.455159634351730347e-01 1.184015050530433655e-01 0.000000000000000000e+00 4.171907901763916016e-03 2.793669700622558594e-03 1.644968986511230469e-03 7.178187370300292969e-04 2.384185791015625000e-06 5.124211311340332031e-04 8.383393287658691406e-04 9.875297546386718750e-04 9.720325469970703125e-04 8.039474487304687500e-04 4.943609237670898438e-04 5.400180816650390625e-05 5.077123641967773438e-04 1.181900501251220703e-03 1.961231231689453125e-03 2.838909626007080078e-03 3.616631031036376953e-03 1.127671357244253159e-02 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00

So, it seems not all boundary conditions are being trained uniformly down to the same tolerance, even though the learning rate is 1e-5 for all of them.

lululxvi commented 3 years ago

The loss doesn't decrease. I still suggest that you first try only perverse at one time. The initial time might be a good starting point, because we know it must satisfy the mass conservation.

bhattah commented 3 years ago

I did try the one-at-a-time approach, and the loss decreases as expected. Here is a sample:

training preservation at the initial point only Step Train loss Test loss Test metric 0 [4.30e-01, 3.30e+00, 6.06e-01, 3.77e+00] [4.30e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
1000 [3.59e-01, 3.20e-01, 1.12e-01, 2.07e-04] [3.59e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
2000 [2.47e-01, 9.39e-02, 7.28e-02, 2.20e-05] [2.47e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
3000 [1.36e-01, 2.86e-03, 5.90e-02, 7.71e-05] [1.36e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
training preservation somewhere in the middle Step Train loss Test loss Test metric 0 [1.36e-01, 3.85e+00, 6.09e-01, 3.96e+00] [1.36e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
1000 [1.43e-01, 6.07e-01, 1.83e-01, 1.65e-01] [1.43e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
2000 [1.58e-01, 2.21e-01, 1.02e-01, 4.20e-05] [1.58e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
3000 [1.70e-01, 2.88e-02, 6.47e-02, 1.73e-05] [1.70e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
training preservation at the end Step Train loss Test loss Test metric 0 [1.04e-01, 1.34e+00, 3.49e-01, 2.05e+00] [1.04e-01, 0.00e+00, 0.00e+00, 0.00e+00] []
1000 [8.66e-02, 3.31e-02, 8.74e-02, 2.89e-04] [8.66e-02, 0.00e+00, 0.00e+00, 0.00e+00] []
2000 [9.30e-02, 1.37e-03, 7.68e-02, 2.81e-04] [9.30e-02, 0.00e+00, 0.00e+00, 0.00e+00] []
3000 [8.67e-02, 5.41e-04, 7.74e-02, 4.24e-04] [8.67e-02, 0.00e+00, 0.00e+00, 0.00e+00] []
training at initial, middle, and final time combined Step Train loss Test loss Test metric 0 [3.34e-02, 3.98e+00, 8.62e-01, 5.84e+00, 5.65e+00, 5.68e+00] [3.34e-02, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00] []
1000 [7.26e-03, 3.54e-01, 2.05e-01, 4.56e-01, 4.18e-01, 4.29e-01] [7.26e-03, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00] []
2000 [4.50e-03, 1.22e-01, 1.39e-01, 9.93e-05, 1.23e-02, 3.78e-04] [4.50e-03, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00] []
3000 [2.73e-03, 1.02e-01, 1.24e-01, 4.20e-05, 8.25e-03, 6.34e-05] [2.73e-03, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00] []

As you can see, losses are going down for almost all regularizers. The prediction, though, becomes more and more inaccurate (like the second figure above) as I train further and further away from the initial time.

lululxvi commented 3 years ago

Although the loss is small, do you check whether the network prediction indeed satisfies the preservation at that time?

bhattah commented 3 years ago

The network prediction error is low but is sometimes more than the learning rate lr=1e-5. Here is a list of prediction errors in preserving C at the myTpts=[0,10,-1]: array([5.71365301e-05, 1.64054834e-03, 8.10317659e-04]) The error increases considerably if I move away from myTpts:

error[[1,11]]
Out[24]: array([0.00022818, 0.00178309])

error[[2,12]]
Out[25]: array([0.00039641, 0.00192274])

error[[5,15]]
Out[26]: array([0.00088431, 0.00232403])

The non-conservation of C is likely to be the reason behind the unexpected output as noted earlier in the image.

lululxvi commented 3 years ago

I don't quite understand your output of the error. What is the "prediction errors" you mentioned?

bhattah commented 3 years ago

I am talking about this error:

error = abs(np.sum(np.reshape(y_pred,(tsize,xsize))**2,axis=1)*dx -C[0])
print("max error in y_pred l2 norm:", max(error))

My previous comment says that this (network prediction) error tends to grow further away from the initial time even if I enforce the C preservation at several time points throughout the time domain.

lululxvi commented 3 years ago

I am a little confused. Let us consider the most simple case. We only constrain C at one t, e.g. t=0.1, and you have one loss for this conservation. Then, after training, can the this loss very small? Also, if you re-compute the C after training, does the C match the target?

bhattah commented 3 years ago

When I only constrain C at t=0.10482353 with the one loss for its conservation, I get that the training is successful for the given lr=1e-5 but the prediction is not as good: 13000 [1.82e-03, 6.19e-04, 1.93e-01, 3.17e-04] [1.82e-03, 0.00e+00, 0.00e+00, 0.00e+00] []
14000 [1.79e-03, 6.57e-04, 1.91e-01, 2.86e-05] [1.79e-03, 0.00e+00, 0.00e+00, 0.00e+00] []
15000 [1.82e-03, 7.03e-04, 1.88e-01, 3.63e-04] [1.82e-03, 0.00e+00, 0.00e+00, 0.00e+00] []
error in y_pred(t=0.10482353) l2 norm: [0.00523692]

As noted in the post above, the prediction becomes worse with increasing t. I hope this answers your question.

lululxvi commented 3 years ago

In step 15000, the training loss of the conservation is 3.63e-04. Could you compute the same loss manullay to check whether the training loss is correctly implemented?

bhattah commented 3 years ago

I am not sure how to compute the loss manually. But, I tried the following.

I put a breakpoint inside model.train() right after the training completes and I enforce the constraint C at the initial time t=0.
Various matrices are 5000 epochs are 4000 [3.54e-03, 2.22e-04, 6.93e-02, 1.47e-04] [3.54e-03, 0.00e+00, 0.00e+00, 0.00e+00] []
5000 [2.93e-03, 2.66e-04, 6.86e-02, 1.88e-04] [2.93e-03, 0.00e+00, 0.00e+00, 0.00e+00] []
But when i manually evaluate np.sum(self.train_state.y_pred_train[4:104]*0.0808), it is equal to 1.226409 compared to C[0], which is 0.564133164589402. The difference of the two is nowhere close to 1.88e-04. The slicing of self.train_state.y_pred_train is necessary to weed out duplicate y values at t=0. The slicing is obtained from observing self.train_state.X_train[mask.any(axis=1)][4:104], where mask = self.train_state.X_train==0.
A similar experiment with self.train_state.y_pred_test also gives a value disparate from C[0]. self.train_state.y_train and self.train_state.y_test are never initialized.

lululxvi commented 3 years ago

So the function of the loss is correct? But why is there still an error of 1.88e-4, not 0? The initial BC should satisfy the constraint, right?

bhattah commented 3 years ago

Yes, the initial condition should satisfy the constraint and it does in the imported data: data = scipy.io.loadmat("dataset/MBE.mat"). But it doesn't do so after the training, and that is the whole point. Actually, there are many x-discretizations in model.train_state.X_train for the same t=0, and the constraint isn't preserved for any one of them after the training. Moreover, imposition of the mass preservation doesn't have anything to do with it, this non-preservation at the initial condition after training can even be observed without imposing mass preservation.

Can we perhaps chat over a video link?

lululxvi commented 3 years ago

You can send me an email.

lululxvi / deepxde

Regularization: structure preserving neural network #174