robinzyb / cp2kdata

cp2k postprocessing tools
https://robinzyb.github.io/cp2kdata/
GNU Lesser General Public License v3.0
52 stars 18 forks source link

override dpdata's builtin formats #24

Closed njzjz closed 9 months ago

njzjz commented 9 months ago

Currently, as shown in dpdata/plugins/init.py, dpdata firstly loads built-in Formats and then loads external Formats. When we keep this behavior, it is possible to override the built-in Format with a compatible Format so users can use the original script without changing the code. DP-GEN doesn't need to change the code as well.

robinzyb commented 9 months ago

@njzjz I just added two more lines in pyproject.py of cp2kdata:

[project.entry-points.'dpdata.plugins']                                       
'cp2kdata/e_f' = "cp2kdata.dpdata_plugin:CP2KEnergyForceFormat"              
'cp2kdata/md' = "cp2kdata.dpdata_plugin:CP2KMDFormat"
'cp2k/output' = "cp2kdata.dpdata_plugin:CP2KEnergyForceFormat"            # same name as the one in dpdata
'cp2k/aimd_output' = "cp2kdata.dpdata_plugin:CP2KMDFormat"                # same name as the one in dpdata
metadata.entry_points().get("dpdata.plugins", [])

The entry points are loaded

(EntryPoint(name='cp2k/aimd_output', value='cp2kdata.dpdata_plugin:CP2KMDFormat', group='dpdata.plugins'),
 EntryPoint(name='cp2k/output', value='cp2kdata.dpdata_plugin:CP2KEnergyForceFormat', group='dpdata.plugins'),
 EntryPoint(name='cp2kdata/e_f', value='cp2kdata.dpdata_plugin:CP2KEnergyForceFormat', group='dpdata.plugins'),
 EntryPoint(name='cp2kdata/md', value='cp2kdata.dpdata_plugin:CP2KMDFormat', group='dpdata.plugins'),
 EntryPoint(name='cp2kdata/e_f', value='cp2kdata.dpdata_plugin:CP2KEnergyForceFormat', group='dpdata.plugins'),
 EntryPoint(name='cp2kdata/md', value='cp2kdata.dpdata_plugin:CP2KMDFormat', group='dpdata.plugins'))

But when I use dpdata with cp2k/xx format, it still calls the old functions

robinzyb commented 9 months ago

Any better idea? I do not exactly understand how plugins load in dpdata after reading the doc of setuptools. from my understanding,

for module_file in Path(__file__).parent.glob("*.py"):
    if module_file.name not in NOT_LOADABLE:
        module_name = f".{module_file.stem}"
        importlib.import_module(module_name, PACKAGE_BASE)

these lines load plugins in dpdata, and the following lines load external

try:
    eps = metadata.entry_points(group="dpdata.plugins")
except TypeError:
    eps = metadata.entry_points().get("dpdata.plugins", [])
for ep in eps:
    plugin = ep.load()

it should work if I override the plugins in dpdata by same name?

njzjz commented 9 months ago

You can debug https://github.com/deepmodeling/dpdata/blob/18a0ed5ebced8b1f6887038883d46f31ae9990a4/dpdata/plugin.py#L29 and print the values.

Do you import cp2kdata before dpdata?

robinzyb commented 9 months ago

I print key and object in #L29.

cp2k/aimd_output <class 'dpdata.plugins.cp2k.CP2KAIMDOutputFormat'>
cp2k/output <class 'dpdata.plugins.cp2k.CP2KOutputFormat'>

dpdata loads old function.

Do you import cp2kdata before dpdata?

No. I didn't. I installed cp2kdata and then open a jp-notebook, imported dpdata then parsed files

njzjz commented 9 months ago

I realize what you do. The name of entry points makes no sense. The format name is defined in

https://github.com/robinzyb/cp2kdata/blob/5f44c64d9afe7403d1648d30f471284fb2d64f81/cp2kdata/dpdata_plugin.py#L16

robinzyb commented 9 months ago

I don't expect to modify the register name.

njzjz commented 9 months ago

The decorator can be used multiple times.

robinzyb commented 9 months ago

It works after I added the extra decorator. Speaking of which, my plugin of cp2kdata/md works slightly differently from cp2k/aimd_output. While the format of output file is hard coded to be *log in cp2k/aimd_output, cp2kdata/md needs to specify explicitly the log file name using argument cp2k_output_name

dpdata.LabeledSystem("./55-20rfj2/", cp2k_output_name="1.log", fmt="cp2k/aimd_output")

This might force those who use cp2k/aimd_output to modify their codes. of course, no change for single point calculations, as well as DP-GEN