Jobs fails when loading previous model

pengguo-seismo commented 2 years ago

Hi Paul,

I hope you are doing well.

I have a question when trying to running the python script. It requires to load a previous trained model, './model/checkpoint500.pt'. Can you please tell me how to obtain this model, or how to define the weights/biases for initializing the network?

many thanks in advance.

paulpuren commented 2 years ago

Hello,

Thank you for your interests in our research. Let us take 2D Burgers equation as an instance. Our goal is to solve the PDE for 1000 time steps. The procedure is to first initialize all the network parameters with function initialize_weights, and then train the model for 100 time steps and save the well-trained model as checkpoint100.pt. Second, we load the checkpoint100.pt as the initialized network parameters for training 200 time steps, then you save another well-trained model as checkpoint200.pt. After repeating many times, you will reach the milestone of 1000 time steps.

Hope that answer your question. Thank you!

norery commented 2 years ago

Thank you for your reply. I have the same problem. I observed that there was no adaptation in the code for multiple training rounds. For example, when I train a step 100 times, what should I change? How do I set the value of 'pre_model_save_path ='? Thank you in advance!

paulpuren commented 2 years ago

Thank you for your reply. I have the same problem. I observed that there was no adaptation in the code for multiple training rounds. For example, when I train a step 100 times, what should I change? How do I set the value of 'pre_model_save_path ='? Thank you in advance!

Thank you for your question. Yes, we only show the code for 1000 time steps. When training for the 100 steps, you will directly apply the function initialize_weights, and you do not need pre_model_save_path for 100 steps.

LiShenshen123 commented 2 years ago

Hello, how do I get the parameter pre_model_save_path? Very confused, hope to get your help, thank you very much

paulpuren commented 2 years ago

Hello, how do I get the parameter pre_model_save_path? Very confused, hope to get your help, thank you very much

Thank you for your question. pre_model_save_path is for the pretrained model. Take 2D burgers as an example. If you pretrain the model starting from 100 steps, then 200 steps, 500 steps. For the 1st pretraining, you do not have pre_model_save_path and directly train the model with the network parameters being initialized based on the function initialize_weights. For the 2nd pretraining, you can initialize the network parameters with the learned model from the 1st pretraining (this is where pre_model_save_path works), and then further train it for 200 steps.

LiShenshen123 commented 2 years ago

For the first pre-training, how to train without pre_model_save_path directly using the network parameters initialized based on the function initialize_weights. It has always reported an error: FileNotFoundError: [Errno 2] No such file or directory: './model/checkpoint500.pt'

paulpuren commented 2 years ago

For the first pre-training, how to train without pre_model_save_path directly using the network parameters initialized based on the function initialize_weights. It has always reported an error: FileNotFoundError: [Errno 2] No such file or directory: './model/checkpoint500.pt'

The checkpoint500.pt here is the saved model for training 500 time steps. We show the code for training 1000 time steps based on the pretrained model of 500 time steps, where you find the pre_model_save_path containing checkpoint500.pt. When first training for 100 time steps, you can name it as checkpoint100.

LiShenshen123 commented 2 years ago

I'm still confused, because I still can't run it successfully. I read that your code also needs a network pre-training weight for the first training. As for the network initialization weight you said, I don't know how to implement it. I see that a pre-trained model is loaded in the train function defined in your code. I'm messy, can you send me a debugged code on how to get the pretrained model in the first step. Really hope to get your help. My mailbox is 2858724272@qq.com. thank you very much!

paulpuren commented 2 years ago

I'm still confused, because I still can't run it successfully. I read that your code also needs a network pre-training weight for the first training. As for the network initialization weight you said, I don't know how to implement it. I see that a pre-trained model is loaded in the train function defined in your code. I'm messy, can you send me a debugged code on how to get the pretrained model in the first step. Really hope to get your help. My mailbox is 2858724272@qq.com. thank you very much!

Hi, we have tested the code. It works well. The code posted in the repo does not have bugs. You may modify it for your own purpose (e.g., for different pretraining schemes or different PDE systems).

Second, for the first training, you do not need pretrained network parameters (e.g., weights). They are initialized based on the function initialize_weights.

Third, the pretrained model is loaded unless there is pretraining happening. Namely, you will only need it after the 1st pretraining.

LiShenshen123 commented 2 years ago

I'm still confused, because I still can't run it successfully. I read that your code also needs a network pre-training weight for the first training. As for the network initialization weight you said, I don't know how to implement it. I see that a pre-trained model is loaded in the train function defined in your code. I'm messy, can you send me a debugged code on how to get the pretrained model in the first step. Really hope to get your help. My mailbox is 2858724272@qq.com. thank you very much!

Hi, we have tested the code. It works well. The code posted in the repo does not have bugs. You may modify it for your own purpose (e.g., for different pretraining schemes or different PDE systems).

Second, for the first training, you do not need pretrained network parameters (e.g., weights). They are initialized based on the function initialize_weights.

Third, the pretrained model is loaded unless there is pretraining happening. Namely, you will only need it after the 1st pretraining.

Thank you for your reply, this is my first training process and the error says that a pre-trained model is required. Is there any special setup required for the first pre-training? thank you QQ图片20220310131203

richardliuss commented 1 year ago

Hi Dr.Ren, Would you please give us a detailed tutorial that can guide to finish the first training? Like how to modify the code, what kind of file structure is needed. Please foregive me for my ignorance to your code. Because I am majored in computational fluid dynamics. Thank you so much. Richard

isds-neu / PhyCRNet

Jobs fails when loading previous model #3