libAtoms / abcd

2 stars 4 forks source link

execution of arbitrary python code applied to each configuration matching the query #41

Closed gabor1 closed 4 years ago

gabor1 commented 5 years ago

The pressure should be calculated by an expression like: 'at.info['pressure'] = -1/3 * np.trace(at.info['virial']/at.get_volume())'

Similarly like: https://github.com/libAtoms/QUIP/blob/public/quippy/scripts/convert.py

p.add_option('-e', '--exec-code', action='store', help="""Python code to execute on each frame before writing it to output file. Atoms object is available as `at`, and do_print is set to True. If the user-supplied code sets do_print to False, the frame is not printed.""")

Originally posted by @fekad in https://github.com/libAtoms/abcd/issues/14#issuecomment-497004987

gabor1 commented 5 years ago

the above is just an example, the pressure is already a derived type.

the subcommand should be "exec" (for now..)

gabor1 commented 5 years ago

--help of exec should give guidance to the variables to be references, e.g. "atoms" to reference the atoms object

gabor1 commented 5 years ago

even better, if the atoms object is not created unless needed, simply convert the database item to a dict (take care to save the dict if it changes) abcd -q exec 'item["energy"]/=ase.units.Bohr'

if the user really wants an ase object, that can also be done with item.to_ase().write("tmp.xyz", mode="a") (or something like that)

fekad commented 5 years ago

Currently, the 'item' is the document object of MongoDB which is quite dangerous. you have even access to drop_collection method etc ... But here are some working examples:

abcd exec -q 'natoms=123' -y 'import ase; print(item.info["energy"]*ase.units.Bohr)'
abcd exec -q 'natoms=123' -y 'print(item.to_atoms())'
gabor1 commented 5 years ago

looks good! I thought you said you'll import tase anyway, so it doesn't have to be done in the loop.

if the item object is the raw database reference, how can it have an "info" member ?

do you currently not do any saving back to the database?

I thought we were discussing creating a funky class that can behave as a dict as well as a database object, to enable item["energy"] etc as well as saving back. If you have such a new class, would the be able to protect against dangerous stuff by restricting what methods can be called? essentially only [], save(), to_atoms ? wouldn't that make it safe ?

gabor1 commented 5 years ago

discussion resulted in:

gabor1 commented 5 years ago

for the moment, can we have the item[] function look up whether the property is in info or array ? otherwise the user needs to know..

gabor1 commented 5 years ago

or indeed if the property is derived.

gabor1 commented 5 years ago

I tried the following

exec quip abcd exec -q volker -q formula~C 'item.info["energy_per_atom"] = item.info["energy"]/item.derived["natoms"]' --yes

but the energy_per_atom info property did not appear in the database

gabor1 commented 5 years ago

item.save() solved it

gabor1 commented 5 years ago

so the remaining work on this issue is to create an object from the item that isn't so dangerous as to allow deletion of the entire database, only allows manipulation of that item.

gabor1 commented 5 years ago

what happens remotely under "--readonly" ? do we disallow exec altogether? or just item.save() is not allowed? the latter would be better.

gabor1 commented 5 years ago

I guess we should disallow the remote exec - too easy to drop to a shell from python...

gabor1 commented 5 years ago

decision: exec is disabled by --readonly

gabor1 commented 5 years ago

The remaining task for this issue is the creation of a new subclass that allows the executed python to access the database item as a simple dictionary

fekad commented 4 years ago

Acessing the dictionary which has save function:

abcd exec -q 'n_atoms=130' -y 'print(at["energy"])'

Converting dict into ase Atoms object:

abcd exec -q 'n_atoms=130' -y 'print(at.to_atoms())'