openkim / kliff

KIM-based Learning-Integrated Fitting Framework for interatomic potentials.
https://kliff.readthedocs.io
GNU Lesser General Public License v2.1
34 stars 20 forks source link

Extracting and Converting from OUTCAR File to XYZ File #177

Open LiMahappy opened 3 months ago

LiMahappy commented 3 months ago

Dear Developer,

I am a beginner with KLIFF and have been able to train potential functions using example files. However, I am unable to access the initial training set files and can only obtain the OUTCAR file from VASP's AIMD calculations. Could you please advise me on how to extract and convert the OUTCAR file into an XYZ file? I am aware that certain software like ASE or pymatgen can be used to parse VASP outputs, and subsequently, the KLIFF write_extxyz function can be utilized to create an extxyz file. However, I am not very clear on the specific steps involved in this process. I would greatly appreciate it if you could provide some guidance, and it would be even more helpful if you could supply a sample case. I am eagerly awaiting your response.

Best regards.

mjwen commented 3 months ago

Hi @LiMahappy !

Yes, we have a separate package, potdata, to deal with data. It is under active development and has lots of functionality. Unfortunately, we haven't documented it extensively.

For converting VASP run output to extxyz files, take a look at this example. It does exactly what you want. Note, if you have multiple vasp runs and change line 127 to the common parent folder, it will extract all the results. Under the hood, it looks for all vasprun.xml files and does the conversion.

LiMahappy commented 3 months ago

vasp-test.zip Thank you for your help. I'm sorry to bother you again. After running the script, I did indeed obtain the corresponding file. However, when I attempted to use this file as the training set to train the potential function, the following error occurred. Could you please tell me how to resolve this issue?

mjwen commented 3 months ago

Hi @LiMahappy,

What is the error? Can you please be more specific? I did not find any error file in the zip file you shared.

LiMahappy commented 3 months ago

I apologize for not providing a detailed description of the issue. After extracting the results from the AIMD calculations using the xyz.py script, I did indeed obtain the relevant training set files. However, when I utilized this training set for the training of the SW potential function, the following error occurred. The error indicates that KLIFF expects the stress data to comprise 6 values, but 9 were provided. Error encountered after running xyz py Additionally, after installing the neural network potential model "NeuralNetwork_KLIFF__MO_000000111111_000" obtained from training with a neural network, and using GULP to invoke this model for calculations, an error occurred when I attempted structural optimization. However, the error did not occur when structural optimization was not performed, such as when calculating mechanical properties.I have recompiled and tested with Intel MPI versions 17 and 20, OneAPI version 22, and OpenMPI, and all have resulted in segmentation fault issues. However, when using models from OPENKIM, such as the Stillinger-Weber (SW) potential for silicon, for training, this issue does not occur. Error encountered during structural optimization with GULP NN.zip Finally, I would like to ask how we can determine the appropriate number of training set samples and the values for energy and force weights, given that the neural network potential is greatly influenced by the training set and the weights. I apologize for being a beginner and for asking so many questions, I am looking forward to your answers.

mjwen commented 3 months ago

Hi @LiMahappy, first, feel free to ask questions. Everybody was once a beginner!

You used potdata to write extxyz files, all nice components of the 3x3 stress matrix is written. But KLIFF uses the Voigt notation of stress, it only requires 6 components. To do it, see the updated example here . Note, you will need to get the latest version of potdata -- simply reinstalling it would work.

If I understand correctly about the segfault problem, you were saying you can actually run the trained NeuralNetwork_KLIFF__MO_000000111111_000 model to compute elastic properties successfully, but segfault occurs when geometry optimization is performed? Do this happen if you run in serial mode without using MPI? I have no other idea why this happens.

The number of training samples and the weights depends on what the problem you want to study -- how complex it is and how accurate you want your model to be. You will need at least thousands of samples to train a model, for example, for a single element system like diamond silicon. You will need to adjust weights of energy and force to find the optimal values that satisfy your accuracy requirements.

LiMahappy commented 3 months ago

Hi @mjwen Thank you very much for your response. I have reinstalled potdata and used the latest example_parsing.py script, successfully obtaining the correctly formatted xyz file. Secondly, regarding the segmentation fault issue, both the serial and parallel versions of GULP failed to perform structural optimization when calling the trained NeuralNetwork_KLIFF__MO_000000111111_000 model, while the SW_StillingerWeber_1985_Si__MO_405512056662_005_kliff_trained model was able to perform structural optimization in both cases. serial mode.zip MPI.zip Lastly, how should I adjust the weights for energy and force? For example, which output parameters should I consider to increase or decrease the weights, and what is the focus of the energy and force weight adjustments? Looking forward to your reply.