Solving Inverse problem with non constant coefficient

Wolpes11 commented 2 years ago

Dear Dr. lulu,

I have questions regarding DeepXDE. I would like to solve an inverse problem for a PDE with coefficients which are functions of space and time A(x,t) and B(x,t). Is this possible? All the examples that I found use constant (scalar) coefficients.

Do you have an example for this situation, please? Thank you in advance for your time!

LanPeng-94 commented 2 years ago

Yes, it is possible, you can use PFNN, as mentioned in the example "elliptic_inverse_field.py".

Wolpes11 commented 2 years ago

Thank you so much! It should work for my problem.

Instead of starting a new thread, I have just another quick question. How do I have to define boundary/initial condition simply based on data? I have a grid of measurements at different position x and time t, I just want to define as boundary conditions values at x=x_start and x=x_end and as initial condition measurements at t=0.

lululxvi commented 2 years ago

Use PointSetBC https://deepxde.readthedocs.io/en/latest/modules/deepxde.icbc.html#deepxde.icbc.boundary_conditions.PointSetBC

Wolpes11 commented 2 years ago

Thank you Dr. Lulu for your reply and thank you for your really great work!

I've just the last question to bother you. I have 157,511,520 data points that in single precision are ~630 MB and I'm working with a GPU with 80GB of memory. But the code crash at epoch=0 for "Out Of Memory Error". What could it be the problem?

lululxvi commented 2 years ago

Is CPU out of memory or GPU?

Wolpes11 commented 2 years ago

I think is GPU. I'm working with 2TB of memory on CPU and it can load the data points correctly. The model compiles correctly and the code prints the loss at epoch=0, but then it crashes because it tries to allocate many other tensors of size [2xN_points, N_neurons_per_layer], if I see correctly.

Thanks again for your time.

lululxvi commented 2 years ago

Then the solutions I can think is either using a smaller dataset, or use mini-batch.

Wolpes11 commented 2 years ago

I tried using ResidualResampler but it doesn't work. I'm not even sure I'm using it correctly. Here it is the part of the code where I'm implementing the model.

geom = dde.geometry.Interval(1, 2)
timedomain = dde.geometry.TimeDomain(0, 1)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)

observe_x, y = gen_traindata()

BC = dde.PointSetBC(observe_x[(observe_x[:,0]==1.)|(observe_x[:,0]==2)], y[(observe_x[:,0]==1)|(observe_x[:,0]==2)], component=0)
IC = dde.PointSetBC(observe_x[observe_x[:,1]==0], y[observe_x[:,1]==0], component=0)
TP = observe_x[(observe_x[:,0]!=1)&(observe_x[:,0]!=2)]

data = dde.data.TimePDE(
    geomtime,
    pde,
    [BC, IC],
    num_domain=0,
    num_boundary=0,
    num_initial=0,
    anchors=TP
)
net = dde.maps.PFNN([2, [30, 30, 30], [20, 20, 20], [20, 10, 10], [20, 1, 1], [20, 1, 1], 3], "tanh", "Glorot uniform")
model = dde.Model(data, net)
model.compile("adam",lr=0.002)
resampler = dde.callbacks.PDEResidualResampler(period=100)
losshistory1, train_state1 = model.train(epochs=10000, callbacks=[resampler])

Thank you!

lululxvi commented 2 years ago

PDEResidualResampler only works for the points sampled by DeepXDE, not for points provided by users via anchors. You domain seems small, and maybe a small dataset is enough.

Wolpes11 commented 2 years ago

Thank you so much for your help. I tried also with batch_size, instead of PDEResidualSampler but the problem remain the same. The domain is so small because I normalized both space and time. Otherwise my domain spans (500m,550m) in space and more than 20 days in time with a resolution of 1s.

lululxvi commented 2 years ago

I tried also with batch_size, instead of PDEResidualSampler but the problem remain the same.

Do you mean using a small training data and you still have the OOM error? This seems strange.

Wolpes11 commented 2 years ago

I tried setting a batch size when I train the model: losshistory1, train_state1 = model.train(epochs=10000, batch_size=1) But it gives me the same OOM error.

The problem is that the maximum size of training set that I can fit in memory does not allow to capture all the features of the data (such as long period oscillations). Thank you!

haison19952013 commented 2 years ago

@Wolpes11 I also met a similar problem...I have a lot of experimental data and I can not fit it all into the memory...For the normal ANN in Tensorflow.Keras, we can easily use the argument "batch_size" for the mini-batch to deal with OOM...However, for PINN in deepXDE, I think the implementation of the mini-batch for the PointSetBC has not been done yet....if you dig into the source code of the function "train()", there is not thing implemented for the argument "batch_size" for PINN...

Wolpes11 commented 2 years ago

Thank you @haison19952013 ! Yes, I had a look at the source code. However, what makes me puzzled is that the entire data set actually fits in memory (it is more or less 600MB) and then the OOM error arises at the beginning of the training phase. The architecture of the neural net should be fixed, independently of the number of samples, 2 inputs (space and time) and 1 output, right?!

haison19952013 commented 2 years ago

Thank you @haison19952013 ! Yes, I had a look at the source code. However, what makes me puzzled is that the entire data set actually fits in memory (it is more or less 600MB) and then the OOM error arises at the beginning of the training phase. The architecture of the neural net should be fixed, independently of the number of samples, 2 inputs (space and time) and 1 output, right?!

Yes, the architecture of the neural net is fixed and I think there is no problem with the architecture. In my case, after the epoch "0", the OOM will occur. For your case, I think your data can fit into your system. However, after epoch "0", the model will have to calculate and store a lot of information, especially the gradient information for the PDE of all data in one time. I think this can be a main reason leading to the OOM. What do you think ?

P/s: If you really want to use the mini-batch right now, I think you can consider the paper "Hidden Fluid Mechanics" which is provided with source code. The author also use a lot of data and apply the mini-batch for training

lululxvi commented 2 years ago

batch_size in model.train doesn't work for PINN, because in PINN there are different types of training points, such as PDE points, BC points, IC points etc. So it is not clear here what batch_size really means.

The current DeepXDE version supports mini-batch of PDE residual points via PDEResidualSampler, but the training points provided by anchors will not be used as mini-batch. In order to also do a mini-batch for anchors, you need to modify the source code of the following line: https://github.com/lululxvi/deepxde/blob/4714a1f4268489c7d2e50302ddefd54a8aa5defb/deepxde/data/pde.py#L237 Instead of using all the points in self.anchors, you can simply randomly pick a subset of self.anchors. Then PDEResidualSampler will also work for anchors points.

Wolpes11 commented 2 years ago

Thank you @haison19952013 ! Yes, I had a look at the source code. However, what makes me puzzled is that the entire data set actually fits in memory (it is more or less 600MB) and then the OOM error arises at the beginning of the training phase. The architecture of the neural net should be fixed, independently of the number of samples, 2 inputs (space and time) and 1 output, right?!

Yes, the architecture of the neural net is fixed and I think there is no problem with the architecture. In my case, after the epoch "0", the OOM will occur. For your case, I think your data can fit into your system. However, after epoch "0", the model will have to calculate and store a lot of information, especially the gradient information for the PDE of all data in one time. I think this can be a main reason leading to the OOM. What do you think ?

P/s: If you really want to use the mini-batch right now, I think you can consider the paper "Hidden Fluid Mechanics" which is provided with source code. The author also use a lot of data and apply the mini-batch for training

Yes, I agree. I think that the main problem is with the temporal gradient which concerns the entire data set. Thank you for your suggestion. I will have a look at the paper "Hidden Fluid Mechanics", even if I found that the package DeepXDE is easier to adapt for different PDE and applications.

Wolpes11 commented 2 years ago

Dear @lululxvi, I modified the source code as you suggested.

idx = np.random.randint(0, len(self.anchors)-1, 1000)
X = np.vstack((self.anchors[idx,:], X))

But I get the same OOM error. Am I doing something wrong? Thank you!

lululxvi commented 2 years ago

It looks OK. You may check the size of X.

Wolpes11 commented 2 years ago

Yes, I have already check it. It is correctly (1000, 2). It seems that the point where the code crashes due to OOM error is not this one.

Thank you!

Wolpes11 commented 2 years ago

Hi Dr @lululxvi, I tried different codes with the dataset I'm using and the batch "trick" you suggested works fine. Do you have any clue why in this case the randomization of the training points does not work?

Thanks in advance!

lululxvi commented 2 years ago

Hi Dr @lululxvi, I tried different codes with the dataset I'm using and the batch "trick" you suggested works fine. Do you have any clue why in this case the randomization of the training points does not work?

What do you mean by "the batch trick" and "randomization of the training points"?

Wolpes11 commented 2 years ago

Hi Dr @lululxvi, I tried different codes with the dataset I'm using and the batch "trick" you suggested works fine. Do you have any clue why in this case the randomization of the training points does not work?

What do you mean by "the batch trick" and "randomization of the training points"?

The following modification of your source code, as you previously suggested:

idx = np.random.randint(0, len(self.anchors)-1, 1000)
X = np.vstack((self.anchors[idx,:], X))

Thank you!

lululxvi commented 2 years ago

You may directly check what is passed as the network input during training, and then figure out step by step what goes wrong in your code.

https://github.com/lululxvi/deepxde/blob/26e5e49987331420879e3cf7e70e3eb379593704/deepxde/model.py#L553

lululxvi / deepxde

Solving Inverse problem with non constant coefficient #527