initialization - Githubissues

dialuser commented 2 years ago

Hi Thanks for sharing the code. What's the rationale for using this uniform range? What do the numbers 3,3,20 represent?

module.weight.data.uniform_(-cnp.sqrt(1 / (3 3 320)), cnp.sqrt(1 / (3 3 320)))

dialuser commented 2 years ago

I also have a question on the step parameter of PhyCRNet class. In your demo, you used 1000. What if the model is so big that not all data can fit into a single batch.

Thanks.

paulpuren commented 2 years ago

Hi Thanks for sharing the code. What's the rationale for using this uniform range? What do the numbers 3,3,20 represent?

module.weight.data.uniform_(-c_np.sqrt(1 / (3 3 320)), c_np.sqrt(1 / (3 3 320)))

Hi,

Thank you for you question. The reason why we use such uniform range is that we try to start with small numbers (close to zeros) and then gradually learn the dynamical propagation. The scientific computing problems are different from the general AI tasks. The traditional Xavier and He approaches will pose a tougher initialization for our problem, especially without any labeled data.

Our initialization method is similar to Xavier Initialization. The only difference is that we have a small weighting coefficient to make the initialized weights close to zeros. In the previous code, 3 is the filter size, and 320 is the multiple of the channel number (e.g., 32). We have modified it to a more general version (see function initialize_weights).

paulpuren commented 2 years ago

I also have a question on the step parameter of PhyCRNet class. In your demo, you used 1000. What if the model is so big that not all data can fit into a single batch.

Thanks.

That is a very good question. For the high-dimensional PDEs, the computational memory is always the key issue. That’s why we propose this discrete learning approach. For your question, if the time steps is very large, here we can use the temporal batches to split the steps.

For example, suppose we have a time step of 2001, we can set a temporal batch size as 1001, which is exactly time_batch_size in the code. Then we will have 2 time batches (num_time_batch), so that in the training function we will have another loop for time batches apart from the training epochs. You may change the time batch sizes to adjust to the memory limits of different PCs.

In addition, there is a trick that you might also be interested. Although we set the time batch size as 1001, we only save the first 1000 variables for output because the solution for last one is not as accurate as others due to the numerical error of finite difference. Then for training the second batch, we start from 1000th solution variable of the first batch, and set it as the initial condition for the second batch. Thus, you will see the second_last_state of ConvLSTM and “second last output” in the code.

dialuser commented 2 years ago

Hi Paul

Thank for the in-depth explanation. That's helpful.

isds-neu / PhyCRNet

initialization #1