materialsvirtuallab / maml

Python for Materials Machine Learning, Materials Descriptors, Machine Learning Force Fields, Deep Learning, etc.
BSD 3-Clause "New" or "Revised" License
374 stars 79 forks source link

MTP training problem #175

Closed yliu1240 closed 3 years ago

yliu1240 commented 3 years ago

Hi, I'm trying to train MTP models with the previous example data and notebook from mlearn package (current MAML seems not having that notebook), but the training process fails with configuration file (.mtp file) giving multiple '-nan' values:

""" MTP version = 1.1.0 potential_name = MTP1m scaling = 1.438492177533894e-04 species_count = 1 potential_tag = radial_basis_type = RBChebyshev min_dist = 4.000000000000000e+00 max_dist = 4.800000000000000e+00 radial_basis_size = 8 radial_funcs_count = 2 radial_coeffs 0-0 {-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan} {-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan} alpha_moments_count = 18 alpha_index_basic_count = 11 alpha_index_basic = {{0, 0, 0, 0}, {0, 1, 0, 0}, {0, 0, 1, 0}, {0, 0, 0, 1}, {0, 2, 0, 0}, {0, 1, 1, 0}, {0, 1, 0, 1}, {0, 0, 2, 0}, {0, 0, 1, 1}, {0, 0, 0, 2}, {1, 0, 0, 0}} alpha_index_times_count = 14 alpha_index_times = {{0, 0, 1, 11}, {1, 1, 1, 12}, {2, 2, 1, 12}, {3, 3, 1, 12}, {4, 4, 1, 13}, {5, 5, 2, 13}, {6, 6, 2, 13}, {7, 7, 1, 13}, {8, 8, 2, 13}, {9, 9, 1, 13}, {0, 10, 1, 14}, {0, 11, 1, 15}, {0, 12, 1, 16}, {0, 15, 1, 17}} alpha_scalar_moments = 9 alpha_moment_mapping = {0, 10, 11, 12, 13, 14, 15, 16, 17} species_coeffs = {-nan} moment_coeffs = {-nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan, -nan} """

Also the training output is weird:

""" WARNING: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! WARNING WARNING WARNING !!! !!! Read a configuration with (negative) Stress. !!! !!! This feature will be removed soon! !!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

BFGS iterations count set to 500 BFGS convergence tolerance set to 1e-08 Energy weight: 1 Force weight: 0.01 Stress weight: 0 MTPR parallel training started BFGS iter 0: f=-nan BFGS iter 1: f=-nan BFGS iter 2: f=-nan BFGS iter 3: f=-nan BFGS iter 4: f=-nan BFGS iter 5: f=-nan ...... BFGS iter 499: f=-nan step limit reached MTPR training ended Rescaling... scaling = 0.000119874348127824, condition number = -nan scaling = 0.000130772016139445, condition number = -nan scaling = 0.000143849217753389, condition number = -nan scaling = 0.000158234139528728, condition number = -nan scaling = 0.000172619061304067, condition number = -nan Rescaling to 0.000143849217753389... done

    * * * TRAIN ERRORS * * *

Errors report Energy: Errors checked for 10 configurations Maximal absolute difference = nan Average absolute difference = nan RMS absolute difference = nan

Energy per atom: Errors checked for 10 configurations Maximal absolute difference = nan Average absolute difference = nan RMS absolute difference = nan

Forces: Errors checked for 540 atoms Maximal absolute difference = -nan Average absolute difference = -nan RMS absolute difference = -nan Max(ForceDiff) / Max(Force) = -nan RMS(ForceDiff) / RMS(Force) = -nan

Stresses (in eV): Errors checked for 10 configurations Maximal absolute difference = -nan Average absolute difference = -nan RMS absolute difference = -nan Max(StresDiff) / Max(Stres) = -nan RMS(StresDiff) / RMS(Stres) = -nan

Virial stresses (in GPa): Errors checked for 10 configurations Maximal absolute difference = -nan Average absolute difference = -nan RMS absolute difference = -nan Max(StresDiff) / Max(Stres) = -nan RMS(StresDiff) / RMS(Stres) = -nan


"""

It seems the problem caused by '-nan' values given by .mtp files.

Thanks a lot and have a nice day!

JiQi535 commented 3 years ago

Hello Yunsheng,

Thanks very much for your question. I have mostly reproduced the errors you got. The current maml is not compatible with mlip-2, which is an updated version of mlip-dev. We are working on making maml compatible with mlip-2 and will update under this issue after fixing the problems.

yliu1240 commented 3 years ago

Hello Yunsheng,

Thanks very much for your question. I have mostly reproduced the errors you got. The current maml is not compatible with mlip-2, which is an updated version of mlip-dev. We are working on making maml compatible with mlip-2 and will update under this issue after fixing the problems.

Thanks a lot!

JiQi535 commented 3 years ago

@yliu1240 Hello Yunsheng,

We have adapted maml to be compatible with both the mlip-2 and the mlip-dev versions of the mlip package, and the officially recommended mlip-2 version is the default version now. You may find an example in training, evaluations, and manipulations of MTP with maml in our example jupyter notebook.

Please kindly let us know if there are still problems with MTP. Thanks!

yliu1240 commented 3 years ago

@yliu1240 Hello Yunsheng,

We have adapted maml to be compatible with both the mlip-2 and the mlip-dev versions of the mlip package, and the officially recommended mlip-2 version is the default version now. You may find an example in training, evaluations, and manipulations of MTP with maml in our example jupyter notebook.

Please kindly let us know if there are still problems with MTP. Thanks!

Hi! Thank you very much! One last question is about the trained models used in Zuo et al.. GAP, NNP, SNAP, and qSNAP models in all 6 materials systems are provided in previous mlearn package, but the MTP models are missing. I'm wondering if it's possible to provide the hyper parameters used in these MTP models?

Thanks again and have a nice day!

YunxingZuo commented 3 years ago

@yliu1240 Hi Yunsheng,

The parameters files of MTP were not provided in mlearn package because the MTP package was not publsihed at the time. The acttached are parameters files of the trained MTP in the paper. Please let me know if you have any questions. JPCA_MTP.zip

yliu1240 commented 3 years ago

@yliu1240 Hi Yunsheng,

The parameters files of MTP were not provided in mlearn package because the MTP package was not publsihed at the time. The acttached are parameters files of the trained MTP in the paper. Please let me know if you have any questions. JPCA_MTP.zip

Hi! Thanks a lot! It seems the fitted.mtp files in the zip are compatible with current MAML package? I checked the parameters listed, and they seem to be identical with the .mtp file in MAML notebook example.

Have a nice day!

JiQi535 commented 3 years ago

Hi! Thanks a lot! It seems the fitted.mtp files in the zip are compatible with current MAML package? I checked the parameters listed, and they seem to be identical with the .mtp file in MAML notebook example.

@yliu1240 Hi Yunsheng,

You are right. The fitted.mtp files should have the same format no matter which version of MLIP was used to train MTP, and they are compatible with the current MAML package.

yliu1240 commented 3 years ago

Hi! Thanks a lot! It seems the fitted.mtp files in the zip are compatible with current MAML package? I checked the parameters listed, and they seem to be identical with the .mtp file in MAML notebook example.

@yliu1240 Hi Yunsheng,

You are right. The fitted.mtp files should have the same format no matter which version of MLIP was used to train MTP, and they are compatible with the current MAML package.

Hi! Thank you! I find that the .mtp files in JPCA_MTP.zip are not 08.mtp from MLIP? They have more radial basis functions, such as 24.mtp. Does that mean selecting appropriate .mtp file is a hyper-parameter to consider in training process as well?

Have a nice day!

JiQi535 commented 3 years ago

I find that the .mtp files in JPCA_MTP.zip are not 08.mtp from MLIP? They have more radial basis functions, such as 24.mtp. Does that mean selecting appropriate .mtp file is a hyper-parameter to consider in training process as well?

@yliu1240 Hi Yunsheng, let me try to answer this question. Yes, we do need to screen through untrained .mtp files provided by MLIP, as the untrained .mtp files utilize different maximum levels of basis functions (levmax). In this process, we usually try to find convergence between the predicted properties like energies, forces, lattice parameters, and elastic constants versus the MTP parameters like levmax and cutoff. I hope my answer helps!

yliu1240 commented 3 years ago

I find that the .mtp files in JPCA_MTP.zip are not 08.mtp from MLIP? They have more radial basis functions, such as 24.mtp. Does that mean selecting appropriate .mtp file is a hyper-parameter to consider in training process as well?

@yliu1240 Hi Yunsheng, let me try to answer this question. Yes, we do need to screen through untrained .mtp files provided by MLIP, as the untrained .mtp files utilize different maximum levels of basis functions (levmax). In this process, we usually try to find convergence between the predicted properties like energies, forces, lattice parameters, and elastic constants versus the MTP parameters like levmax and cutoff. I hope my answer helps!

Thanks for your patience! It helps a lot! I have a question may not be strictly considered as an issue here. It seems training with large amount of radial functions is very time-consuming. I'm curious about how many iterations are reasonable in JPCA_MTP models? I found that mtp files such as 24.mtp may cost several minutes for a few iterations. Is this normal?

Thank you again!

JiQi535 commented 3 years ago

@yliu1240 Hi Yunsheng, it is normal for untrained MTP with more complex basis functions (eg. 24.mtp) to take a longer training time. For the number of iteration during MTP training, we usually take the default max-iter of 1000 from the MLIP package, and we find that MTP training can mostly stop before reaching 1000 iterations, as convergence has been achieved. If convergence is slow, you may consider enlarging max-iter to 2000, and you may set a less strict convergence threshold (eg. bfgs-conv-tol). These mlp training parameters can be found in mlip manuals or mlp help train. I hope my answer helps!

yliu1240 commented 3 years ago

@yliu1240 Hi Yunsheng, it is normal for untrained MTP with more complex basis functions (eg. 24.mtp) to take a longer training time. For the number of iteration during MTP training, we usually take the default max-iter of 1000 from the MLIP package, and we find that MTP training can mostly stop before reaching 1000 iterations, as convergence has been achieved. If convergence is slow, you may consider enlarging max-iter to 2000, and you may set a less strict convergence threshold (eg. bfgs-conv-tol). These mlp training parameters can be found in mlip manuals or mlp help train. I hope my answer helps!

Thank you very much! It helps a lot!