libAtoms / extxyz

Extended XYZ specification and parsers
MIT License
12 stars 4 forks source link

Extended XYZ specification and parsing tools

This repository contains a specification of the extended XYZ (extxyz) file format, and tools for reading and writing to it from programs written in C, Fortran, Python and Julia.

Installation

Python

The latest development version can be installed via

pip install git+https://github.com/libAtoms/extxyz

This requires Python 3.6+ and a working C compiler, plus the PCRE2 and libcleri libraries. libcleri is included here as a submodule and will be compiled automatically, but you may need to install PCRE2 with something similar to one of the following commands.

brew install pcre2 # Mac OS, with homebrew
sudo apt-get install libpcre2-dev # Ubuntu

Binary wheels for Linux and MacOS which do not require PCRE2 or libcleri are built in the GitHub CI for each tagged release.

Stable releases are made to PyPI, so you can install with

pip install extxyz

libextxyz C library

The underlying parser is written in C. The C code is compiled automatically when you build the Python package, but you can also compile it as shared library libextxyz.so as follows

make -C libextxyz
make -C libextxyz install

The Makefile respects the usual environment variables CC, CFLAGS, LDFLAGS, etc, plus prefix (default /usr/local).

Fortran bindings

To build the fextxyz exectuable demonstrating the Fortran bindings, you first need to download and compile QUIP -- see the CI for an example of how to do that automatically. Then, set QUIP_ROOT and QUIP_ARCH

export QUIP_ARCH=linux_x86_64_gfortran
export QUIP_ROOT=/path/to/QUIP
make -C libextxyz fextxyz

The Fortran bindings will later be moved to QUIP, since they are tied to QUIP's Dictionary and Atoms types.

Julia bindings

Julia bindings are distributed in a separate package, named ExtXYZ.jl. See its documentation for further details.

Usage

Usage of the Python package is similar to the ase.io.read() and ase.io.write() functions, e.g:

from extxyz import read, iread, write, ExtXYZTrajectoryWriter
from ase.build import bulk
from ase.optimize import BFGS
from ase.calculators.emt import EMT

atoms = bulk("Cu") * 3
frames = [atoms.copy() for frame in range(3)]
for frame in frames:
    frame.rattle()

write("filename.xyz", frames)

frames = read("filename.xyz") # all frames in file
atoms = read("filename.xyz", index=0) # first frame in file
write("newfile.xyz", frames)

traj = ExtXYZTrajectoryWriter("traj.xyz", atoms=atoms)
atoms.calc = EMT()
opt = BFGS(atoms, trajectory=traj)
opt.run(fmax=1e-3)

There is also an extxyz command line tool for testing purposes, see extxyz -h for help. This can alternatively be invoked via python -m extxyz.

Remaining issues

  1. make treatement of 9 elem old-1d consistent: now extxyz.py always reshapes (not just Lattice) to 3x3, but extxyz.c does not.
  2. Since we're using python regexp/PCRE, we could make per-atom strings be more complex, e.g. bare or quoted strings from key-value pairs. Should we?
  3. Decide what to do about unparseable comment lines. Just assume an old fashioned xyz with an arbitrary line, or fail? I don't think we really want every parsing breaking typo to result in plain xyz.
  4. Used to be able to quote with {}. Do we want to support this?

Extended XYZ specifcation

General formatting

General definitions

Primitive Data Types

String

Sequence of one or more allowed characters, optionally quoted, but must be quoted in some circumstances.

Simple string

Sequence of one or more allowed characters, unquoted (so even outermost quotes are part of string), and without whitespace

Logical/boolean

Integer number

string of one or more decimal digits, optionally preceded by sign

Order for identifying primitive data types, accept first one that matches

one dimensional array (vector)

sequence of one or more of the same primitive type

two dimensional array (matrix)

sequence of one or more new style one dimensional arrays of the same length and type

XYZ file

A concatenation of 1 or more FRAMES (below), with optional blank lines at the end (but not between frames)

FRAME

key=value pairs on second ("comment") line

Associates per-configuration value with key. Spaces are allowed around = sign, which do not become part of the key or value.

Key: bare or quoted string

Value: primitive type, 1-D array, or 2-D array. Type is determined from context according to order specified above.

Special key "Properties”: defines the columns in the subsequent lines in the frame.

Per-atom data lines

Each column contains a sequence of primitive types, except string, which is replaced with simple string, separated by one or more whitespace characters, ending with EOL (optional for last line). The total number of columns in each row must be equal to the M and to the sum of the counts "m" in the "Properties" value string.

READING ase.atoms.Atoms FROM THIS FORMAT

Specific keys indicate special values, with specific order for overriding

Key-value pairs:

Properties keys (all types are per-atom), types are simple

WRITING ase.atoms.Atoms TO THIS FORMAT

General considerations