convert data from CP2K aimd output

WeJear commented 1 year ago

When I was using the dpdata package to convert the output of cp2k aimd, some errors occurred. Hopefully someone can help me figure out what the problem is.

import dpdata data = dpdata.LabeledSystem('./water-dp', cp2k_output_name = 'md.log', fmt="cp2kdata/md") print(data)

The following is the error： --- You are parsing data using package Cp2kData --- Traceback (most recent call last): File "/home/xwj/materials/water/water-deepmd/water-dp.py", line 10, in data = dpdata.LabeledSystem('./{}-dp'.format(file_label), cp2k_output_name = 'md.log', fmt="cp2kdata/md") File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 183, in init self.from_fmt( File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 220, in from_fmt return self.from_fmt_obj(load_format(fmt), file_name, kwargs) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 1113, in from_fmt_obj data = fmtobj.from_labeled_system(file_name, kwargs) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/dpdata_plugin.py", line 73, in from_labeled_system cp2kmd = Cp2kOutput(output_file=cp2k_output_name, run_type="MD", path_prefix=path_prefix) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/output.py", line 68, in init self.dft_info = parse_dft_info(self.filename) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/block_parser/header_info.py", line 91, in parse_dft_info return DFTInfo(ks_type=dft_info["ks_type"][0][0][0], multiplicity=dft_info["multiplicity"][0][0][0]) IndexError: list index out of range (deepmd) xwj@ERNIC:~/materials/water/water-deepmd$ python water-dp.py --- You are parsing data using package Cp2kData --- Traceback (most recent call last): File "/home/xwj/materials/water/water-deepmd/water-dp.py", line 10, in data = dpdata.LabeledSystem('./{}-dp'.format(file_label), cp2k_output_name = 'md.log', fmt="cp2kdata/md") File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 183, in init self.from_fmt( File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 220, in from_fmt return self.from_fmt_obj(load_format(fmt), file_name, kwargs) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 1113, in from_fmt_obj data = fmtobj.from_labeled_system(file_name, kwargs) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/dpdata_plugin.py", line 73, in from_labeled_system cp2kmd = Cp2kOutput(output_file=cp2k_output_name, run_type="MD", path_prefix=path_prefix) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/output.py", line 68, in init self.dft_info = parse_dft_info(self.filename) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/block_parser/header_info.py", line 91, in parse_dft_info return DFTInfo(ks_type=dft_info["ks_type"][0][0][0], multiplicity=dft_info["multiplicity"][0][0][0]) IndexError: list index out of range

I provided my .inp file, the other files are too big for me to upload： water-dp.zip

WeJear commented 1 year ago

I tried another one and another error occurred In addition, the water-dp directory contains md.log, .enrg, frc-1.xyz, *pos-1.xyz files

import dpdata file_label = "water" data = dpdata.LabeledSystem('{}-dp'.format(file_label), fmt="cp2kdata/md") print(data)

The following is the error：

--- You are parsing data using package Cp2kData --- Traceback (most recent call last): File "/home/xwj/materials/water/water-deepmd/water-dp.py", line 12, in data = dpdata.LabeledSystem('{}-dp'.format(file_label), fmt="cp2kdata/md") File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 183, in init self.from_fmt( File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 220, in from_fmt return self.from_fmt_obj(load_format(fmt), file_name, kwargs) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/dpdata/system.py", line 1113, in from_fmt_obj data = fmtobj.from_labeled_system(file_name, kwargs) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/dpdata_plugin.py", line 73, in from_labeled_system cp2kmd = Cp2kOutput(output_file=cp2k_output_name, run_type="MD", path_prefix=path_prefix) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/output.py", line 84, in init parse_run_type() File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/output.py", line 311, in parse_md self.md_info = parse_md_info(self.filename) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/cp2kdata/block_parser/header_info.py", line 109, in parse_md_info md_info = regrep( File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/monty/re.py", line 36, in regrep gen = reverse_readfile(filename) if reverse else zopen(filename, "rt") File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/site-packages/monty/io.py", line 37, in zopen name, ext = os.path.splitext(filename) File "/home/xwj/miniconda3/envs/deepmd/lib/python3.10/posixpath.py", line 118, in splitext p = os.fspath(p) TypeError: expected str, bytes or os.PathLike object, not NoneType

Hope you can help me find out what's wrong。Thanks

robinzyb commented 1 year ago

input file is not used in parsing. I found the error may arise from you log file. Need the log file to debug

robinzyb commented 1 year ago

which version of cp2k are you using?

robinzyb commented 1 year ago

no need, you can run your case with 2-3 steps md. and upload whole files to github

WeJear commented 1 year ago

which version of cp2k are you using?

The version I am using is 9.1. If possible, I can send the log file to your email.

WeJear commented 1 year ago

no need, you can run your case with 2-3 steps md. and upload whole files to github

Okay, I'll try it.

WeJear commented 1 year ago

no need, you can run your case with 2-3 steps md. and upload whole files to github

Okay, I'll try it.

I'm very sorry, I am only responsible for training potential functions, I am not good at the molecular dynamics of cp2k. It's too late today, I can only ask others tomorrow. So I won’t take up your time tonight. Thank you for replying to my question. I will chat with you in detail tomorrow.

robinzyb commented 1 year ago

no need, you can run your case with 2-3 steps md. and upload whole files to github

Okay, I'll try it.

I'm very sorry, I am only responsible for training potential functions, I am not good at the molecular dynamics of cp2k. It's too late today, I can only ask others tomorrow. So I won’t take up your time tonight. Thank you for replying to my question. I will chat with you in detail tomorrow.

I am in other time zone. Won't reply so quickly.

WeJear commented 1 year ago

no need, you can run your case with 2-3 steps md. and upload whole files to github

Okay, I'll try it.

I'm very sorry, I am only responsible for training potential functions, I am not good at the molecular dynamics of cp2k. It's too late today, I can only ask others tomorrow. So I won’t take up your time tonight. Thank you for replying to my question. I will chat with you in detail tomorrow.

I am in other time zone. Won't reply so quickly.

It's okay, you can help me in your free time. Thank you again.

WeJear commented 1 year ago

no need, you can run your case with 2-3 steps md. and upload whole files to github

I've got the files. I hope you can help me take a look at it. Thank you. water-dp.zip

robinzyb commented 1 year ago

Great, I found the simulation is driven by xTB. But cp2kdata was design for parsing DFT data for now. I will label this as enhancement and update ASAP.

WeJear commented 1 year ago

Great, I found the simulation is driven by xTB. But cp2kdata was design for parsing DFT data for now. I will label this as enhancement and update ASAP.

Okay, thank you for your hard work

robinzyb commented 1 year ago

Try install the cp2kdata from devel branch. It can be used to parse NPT_I of xTB for now. commit: 1fae663e2fded526f07e313cde444e6dea16eb54

import dpdata
cp2kmd_dir = "./test"
cp2kmd_output_name = "58-water.log"
dp = dpdata.LabeledSystem(cp2kmd_dir, cp2k_output_name=cp2kmd_output_name, fmt="cp2kdata/md")
print(dp)

--- You are parsing data using package Cp2kData ---
Parsing Energies from ./test/water-300K-1.ener
Parsing Structures from ./test/water-300K-pos-1.xyz
Parsing Froces from ./test/water-300K-nvt.force-frc-1.xyz
Parsing Stress from the CP2K output/log file: ./test/58-water.log
Parsing Cells Information from ./test/58-water.log
Atom names are fake chemical symbols as you set in cp2k input.
--- You are parsing data using package Cp2kData ---
Data Summary
Labeled System
-------------------
Frame Numbers      : 1026
Atom Numbers       : 174
Including Virials  : No
Element List       :
-------------------
H  O
116  58

Still the cp2k files are too large for me. I will put your case in cp2kdata testsuit to ensure the code is robust for future development. Could you compute or ask your colleague for helping compute the md with only 2-3 steps. Just modify the

&MOTION
   &MD
     ENSEMBLE NPT_I
     STEPS 5000 -> 2 or 3 is ok

STEPS from 5000 to 2 or 3. And upload the whole folder. Thank you in advance

WeJear commented 1 year ago

Okay, I will do this as you said tomorrow, thank you for your hard work

Try install the cp2kdata from devel branch. It can be used to parse NPT_I of xTB for now. commit: 1fae663
import dpdata
cp2kmd_dir = "./test"
cp2kmd_output_name = "58-water.log"
dp = dpdata.LabeledSystem(cp2kmd_dir, cp2k_output_name=cp2kmd_output_name, fmt="cp2kdata/md")
print(dp)
--- You are parsing data using package Cp2kData ---
Parsing Energies from ./test/water-300K-1.ener
Parsing Structures from ./test/water-300K-pos-1.xyz
Parsing Froces from ./test/water-300K-nvt.force-frc-1.xyz
Parsing Stress from the CP2K output/log file: ./test/58-water.log
Parsing Cells Information from ./test/58-water.log
Atom names are fake chemical symbols as you set in cp2k input.
--- You are parsing data using package Cp2kData ---
Data Summary
Labeled System
-------------------
Frame Numbers      : 1026
Atom Numbers       : 174
Including Virials  : No
Element List       :
-------------------
H  O
116  58
Still the cp2k files are too large for me. I will put your case in cp2kdata testsuit to ensure the code is robust for future development. Could you compute or ask your colleague for helping compute the md with only 2-3 steps. Just modify the
&MOTION
   &MD
     ENSEMBLE NPT_I
     STEPS 5000 -> 2 or 3 is ok
STEPS from 5000 to 2 or 3. And upload the whole folder. Thank you in advance

Okay, I will do this as you said tomorrow, thank you for your hard work

WeJear commented 1 year ago

Try install the cp2kdata from devel branch. It can be used to parse NPT_I of xTB for now. commit: 1fae663
import dpdata
cp2kmd_dir = "./test"
cp2kmd_output_name = "58-water.log"
dp = dpdata.LabeledSystem(cp2kmd_dir, cp2k_output_name=cp2kmd_output_name, fmt="cp2kdata/md")
print(dp)
--- You are parsing data using package Cp2kData ---
Parsing Energies from ./test/water-300K-1.ener
Parsing Structures from ./test/water-300K-pos-1.xyz
Parsing Froces from ./test/water-300K-nvt.force-frc-1.xyz
Parsing Stress from the CP2K output/log file: ./test/58-water.log
Parsing Cells Information from ./test/58-water.log
Atom names are fake chemical symbols as you set in cp2k input.
--- You are parsing data using package Cp2kData ---
Data Summary
Labeled System
-------------------
Frame Numbers      : 1026
Atom Numbers       : 174
Including Virials  : No
Element List       :
-------------------
H  O
116  58
Still the cp2k files are too large for me. I will put your case in cp2kdata testsuit to ensure the code is robust for future development. Could you compute or ask your colleague for helping compute the md with only 2-3 steps. Just modify the
&MOTION
   &MD
     ENSEMBLE NPT_I
     STEPS 5000 -> 2 or 3 is ok
STEPS from 5000 to 2 or 3. And upload the whole folder. Thank you in advance

Try install the cp2kdata from devel branch. It can be used to parse NPT_I of xTB for now. commit: 1fae663
import dpdata
cp2kmd_dir = "./test"
cp2kmd_output_name = "58-water.log"
dp = dpdata.LabeledSystem(cp2kmd_dir, cp2k_output_name=cp2kmd_output_name, fmt="cp2kdata/md")
print(dp)
--- You are parsing data using package Cp2kData ---
Parsing Energies from ./test/water-300K-1.ener
Parsing Structures from ./test/water-300K-pos-1.xyz
Parsing Froces from ./test/water-300K-nvt.force-frc-1.xyz
Parsing Stress from the CP2K output/log file: ./test/58-water.log
Parsing Cells Information from ./test/58-water.log
Atom names are fake chemical symbols as you set in cp2k input.
--- You are parsing data using package Cp2kData ---
Data Summary
Labeled System
-------------------
Frame Numbers      : 1026
Atom Numbers       : 174
Including Virials  : No
Element List       :
-------------------
H  O
116  58
Still the cp2k files are too large for me. I will put your case in cp2kdata testsuit to ensure the code is robust for future development. Could you compute or ask your colleague for helping compute the md with only 2-3 steps. Just modify the
&MOTION
   &MD
     ENSEMBLE NPT_I
     STEPS 5000 -> 2 or 3 is ok
STEPS from 5000 to 2 or 3. And upload the whole folder. Thank you in advance

I tested the code. In the case of a small amount of data, it could run well. But when my step is at 20,000, the code runs stuck for a long time, and I can't make sure whether it is running correctly.

import dpdata
data = dpdata.LabeledSystem('water-dp', cp2k_output_name = 'md.log', fmt="cp2kdata/md")
print(data)

(deepmd) xwj@ERNIC:~/materials/water/water-deepmd$ python water-dp.py 
--- You are parsing data using package Cp2kData ---
Parsing Energies from water-dp/water-300K-1.ener
Parsing Structures from water-dp/water-300K-pos-1.xyz

robinzyb commented 1 year ago

It is expected. since the cell information is stored in the huge output file, the code has to go through the file, which takes time.

WeJear commented 1 year ago

It is expected. since the cell information is stored in the huge output file, the code has to go through the file, which takes time.

ok ， i will try longer time

WeJear commented 1 year ago

Try install the cp2kdata from devel branch. It can be used to parse NPT_I of xTB for now. commit: 1fae663
import dpdata
cp2kmd_dir = "./test"
cp2kmd_output_name = "58-water.log"
dp = dpdata.LabeledSystem(cp2kmd_dir, cp2k_output_name=cp2kmd_output_name, fmt="cp2kdata/md")
print(dp)
--- You are parsing data using package Cp2kData ---
Parsing Energies from ./test/water-300K-1.ener
Parsing Structures from ./test/water-300K-pos-1.xyz
Parsing Froces from ./test/water-300K-nvt.force-frc-1.xyz
Parsing Stress from the CP2K output/log file: ./test/58-water.log
Parsing Cells Information from ./test/58-water.log
Atom names are fake chemical symbols as you set in cp2k input.
--- You are parsing data using package Cp2kData ---
Data Summary
Labeled System
-------------------
Frame Numbers      : 1026
Atom Numbers       : 174
Including Virials  : No
Element List       :
-------------------
H  O
116  58
Still the cp2k files are too large for me. I will put your case in cp2kdata testsuit to ensure the code is robust for future development. Could you compute or ask your colleague for helping compute the md with only 2-3 steps. Just modify the
&MOTION
   &MD
     ENSEMBLE NPT_I
     STEPS 5000 -> 2 or 3 is ok
STEPS from 5000 to 2 or 3. And upload the whole folder. Thank you in advance

Try install the cp2kdata from devel branch. It can be used to parse NPT_I of xTB for now. commit: 1fae663
import dpdata
cp2kmd_dir = "./test"
cp2kmd_output_name = "58-water.log"
dp = dpdata.LabeledSystem(cp2kmd_dir, cp2k_output_name=cp2kmd_output_name, fmt="cp2kdata/md")
print(dp)
--- You are parsing data using package Cp2kData ---
Parsing Energies from ./test/water-300K-1.ener
Parsing Structures from ./test/water-300K-pos-1.xyz
Parsing Froces from ./test/water-300K-nvt.force-frc-1.xyz
Parsing Stress from the CP2K output/log file: ./test/58-water.log
Parsing Cells Information from ./test/58-water.log
Atom names are fake chemical symbols as you set in cp2k input.
--- You are parsing data using package Cp2kData ---
Data Summary
Labeled System
-------------------
Frame Numbers      : 1026
Atom Numbers       : 174
Including Virials  : No
Element List       :
-------------------
H  O
116  58
Still the cp2k files are too large for me. I will put your case in cp2kdata testsuit to ensure the code is robust for future development. Could you compute or ask your colleague for helping compute the md with only 2-3 steps. Just modify the
&MOTION
   &MD
     ENSEMBLE NPT_I
     STEPS 5000 -> 2 or 3 is ok
STEPS from 5000 to 2 or 3. And upload the whole folder. Thank you in advance

I've got the files and I hope this helps. water-dp.zip

robinzyb commented 1 year ago

Great. It helps.

robinzyb / cp2kdata

convert data from CP2K aimd output #26