Instability of weights initialization (of the pde) leading to convergence issues

clarahm98 commented 3 years ago

First of all thanks for this library and your replies. I am facing some troubles with the implementation of PINNs for a reaction-diffusion system (1 PDE + 1 coupled ODE) that I am trying.

The main issue that I am currently facing is that the order of magnitude of the pdes losses when they are started to be trained varies a lot from run to run. Sometimes, these losses are small (around 10^-1) and the model is able to converge to the correct solution while in others these losses are very big (10^4 or bigger) causing in some cases convergence issues (even the appearance of NaN values in the losses).

Imagen4 Imagen5 Imagen7 Imagen8

I have tried different ways to stabilize it (increasing the num_domain and num_boundary, decreasing lr, adaptative weights, etc) but the problem is still appearing. Could adding explicit boundary conditions for V help? I also don’t quite understand the location of the num_test points – if it is random, could that help explain the variability between the different runs? I do not know if adding some initial conditions for the V will help to increase its stabilization.

I do not know where it comes from (maybe from the definition of the pdes as they are stiff or it may come from the way the model is created). My goal (once I stabilize it) is to scale to 2D but in the few trials that I have done in 2D, the described problem persists and it is even more frequent. I would be very glad if you can provide me some suggestions about how to solve this problem.

The system that I am trying to solve is:

Imagen3

I have also added a Neuman boundary condition for the V. And this is how the data should look like (left: b and right: w). As mentioned, the problem is stiff.

Imagen1 Imagen2

You can find the data here (trial_1D_RK.zip) and this is the code that I am currently using:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import scipy.io

import numpy as np

import deepxde as dde
from deepxde.backend import tf
from plotting_outcome import plot_losshistory, plot_beststate,plot_observed_predict

def gen_traindata():
    data = scipy.io.loadmat('trial_2D.mat')
    t, x,y, Vsav, Wsav = data["t"], data["x"], data["y"],data["Vsav"], data["Wsav"]
    X,T, Y= np.meshgrid(x,t,y)
    X = np.reshape(X, (-1, 1))
    Y = np.reshape(Y, (-1, 1))
    T = np.reshape(T, (-1, 1))
    V = np.reshape(Vsav, (-1, 1))
    #V=V+ noise*np.std(V)*np.random.randn(V.shape[0], V.shape[1])
    W = np.reshape(Wsav, (-1, 1))
    return np.hstack((X, Y, T)), V, W

def main():
    #D=  tf.math.exp(tf.Variable(-1.6094))
    def pde(x, y):
        V, W = y[:, 0:1], y[:, 1:2]
        dca_t = dde.grad.jacobian(y, x, i=0, j=2)
        dca_xx = dde.grad.hessian(y, x, component=0, i=0, j=0)
        dca_yy = dde.grad.hessian(y, x, component=0,i=1, j=1)
        dcb_t = dde.grad.jacobian(y, x, i=1, j=2)
        #Compute D
        var=y[:, 2:3]
        D1=tf.math.sigmoid(var)*0.08+0.02; #D can be: 0.02 or 0.1

        eq_a = dca_t -  D1* (dca_xx +dca_yy) + 8*V*(V-0.01)*(V-1) +W*V 
        eq_b = dcb_t -  (0.002 +0.2*W/(0.3+V))*(-W -8*V*(V-0.15-1))
        return [eq_a, eq_b]
    #Define geometry and time domain
    observe_x, V, W = gen_traindata()
    min_x=np.min(observe_x[:,0:1])
    max_x=np.max(observe_x[:,0:1])           
    min_y=np.min(observe_x[:,1:2])
    max_y=np.max(observe_x[:,1:2])
    min_t=np.min(observe_x[:,2:3])
    max_t=np.max(observe_x[:,2:3])
    geom = dde.geometry.Rectangle([min_x,min_y], [max_x,max_y])
    timedomain = dde.geometry.TimeDomain(min_t, max_t)
    geomtime = dde.geometry.GeometryXTime(geom, timedomain)
    #Boundary condition
    bc_a = dde.NeumannBC(
        geomtime, lambda x:  np.zeros((len(x), 1)), lambda _, on_boundary: on_boundary, component=0
    )
    #Points of the .mat file
    observe_y1 = dde.PointSetBC(observe_x, V, component=0)
    observe_y2 = dde.PointSetBC(observe_x, W, component=1)
    data = dde.data.TimePDE(
        geomtime,
        pde,
        [bc_a, observe_y1,observe_y2],
        num_domain=4*40000,
        num_boundary=4*4000,
        #num_initial=150,
        train_distribution="uniform",
        anchors=observe_x,
        num_test=50000,
    )
    #Create the train and the model
    net = dde.maps.FNN([3] + [20] * 3 + [3], "tanh", "Glorot uniform")
    model = dde.Model(data, net)
    model.compile("adam", lr=0.001)
    #Train
    losshistory, train_state = model.train(epochs=90000)
    #Results
    dde.saveplot(losshistory, train_state, issave=True, isplot=True)
    observe_x, V, W = gen_traindata()
    y_pred = model.predict(observe_x)
    f = model.predict(observe_x, operator=pde)
    print("Mean residual:", np.mean(np.absolute(f)))
    y_true=np.concatenate((V, W), axis=1)
    print("L2 relative error:", dde.metrics.l2_relative_error( y_true, y_pred))
    np.savetxt("true_pred.dat", np.hstack((observe_x, y_true, y_pred)),header="observe_x,y_true, y_pred")
    plot_losshistory(losshistory)
    plot_beststate(train_state)
    plot_observed_predict(observe_x,y_true, y_pred)
    return train_state, y_true, y_pred,f
if __name__ == "__main__":
    train_state, y_true, y_pred,f=main()
    X_train, y_train, X_test, y_test, best_y, best_ystd = train_state.packed_data()

lululxvi commented 3 years ago

The Glorot initialization is the standard initialization method. As we know, initialization is important for network training. So, the simple solution could be: check the loss at step 0; if it is large, then just restart.
Test points are on a uniform grid, not random.
Not sure what you mean by "explicit boundary conditions"
It may be helpful to consider the two-stages training strategy in page 6 at https://doi.org/10.1371/journal.pcbi.1007575

Ryszard2 commented 3 years ago

@lululxvi

the paper you are referring to is great, I'm wondering how can be possible to replicate the Step 1 of that training strategy. Say that I have a PDE with:

3 outputs
3 ICs
2 BCs

then the usual output row at a given training step would be in the form:

[pde1, pde2, pde3, ic1, ic2, ic3, bc1, bc2]

The first supervised training phase would take into account only ICs and BCs, isn't it? If I simply set to zero the weights associated to the PDE's losses, I'd have this:

[0, 0, 0, ic1, ic2, ic3, bc1, bc2]

But I'm not sure, I feel like I'm paying a computational price to the unsupervised part of the problem anyway. What's the proper implementation of the Step 1 of that training strategy?

#

Thank you, Riccardo

lululxvi commented 3 years ago

It is correct. It is OK to just set those to be 0. It is true that it wastes some computation, but it is fine, because the first stage training only needs a few thousands of iterations, which is very fast. You can also check the github code of the paper https://github.com/alirezayazdani1/SBINNs

lululxvi / deepxde

Instability of weights initialization (of the pde) leading to convergence issues #305