Add `unique_shifts` method for `atomman.defects.FreeSurface`

lan496 commented 2 years ago

FreeSurface.shifts returns candidates for terminations of free surfaces. However, Some of the terminations are identical under symmetry operations of ucell. This PR implements unique_shifts method which returns the nonequivalent terminations under symmetry operations. This function will be useful to avoid duplicated surface calculations.

Procedure for filtering nonequivalent shifts are as follows:

analyze a space group of ucell by spglib
create Plane objects to represent shift and apply symmetry operations in the space group to them
If planes are identical with some symmetry operation and lattice translation, choose one of them

I have added a unit test for (100) surfaces of anatase, which is a working example from Section 2.4 of Sun and Ceder (2013). I have verified that unique_shifts function correctly returns the result with the article: there are two possible shifts and they are identical under a glide operation.

lmhale99 commented 2 years ago

This is a good thing to add! I remember the Sun and Cedar paper mentioning this, so it’s good that someone figured out how to implement it.

I read through the code and have a few comments so far. I’ll start running and testing later this week.

The Box.cart method is ambiguous and I think is being used differently than is defined. The description says conversions of coordinates from relative to Cartesian, but it is being used to convert crystal vectors to Cartesian. They would be the same except Box origins are not explicitly (0,0,0). The respective conversions are in atomman elsewhere (tools.miller.vector_crystal_to_cartesian and System.unscale), but could be added as methods of Box.

I’ll work on the Box methods. Adding the vector conversion to Box is trivial. I need to think about renaming scale/unscale to something more informative like relative_to_cartesian or coord_relative_to_cartesian. It’s also curious how I seem to be doing effectively the same operation three different ways - NDArray.dot, np.inner with transpose, and sum over np.outer.

For Plane, my only comment is that maybe there should be another method “isclose” that does the plane comparisons but also allows atol and rtol values to be specified as users may want to change them. Then, either use Plane.isclose for the comparisons or make __eq__ call Plane.isclose using default tolerance values.

With FreeSurface, should there be a variable for the trial image range? int with default value of 1?

I should probably also start looking into adding annotations elsewhere.

lan496 commented 2 years ago

Thanks for your comments. I have added some quick fixes. I greatly appreciate your reviews!

I remember the Sun and Cedar paper mentioning this, so it’s good that someone figured out how to implement it.

I cannot find a relevant implementation to filter duplicated shifts. For example, pymatgen’s surface module enumerates all possible shifts and filter them after constructing structures by StructureMatcher. So, there may be a better algorithm than that implemented in this PR.

The respective conversions are in atomman elsewhere (tools.miller.vector_crystal_to_cartesian and System.unscale) I need to think about renaming scale/unscale to something more informative like relative_to_cartesian or coord_relative_to_cartesian.

Sorry, I did not notice this function. I have rewritten the code to properly use vector_crystal_to_cartesian. For now, I tentatively leave Box.cart method.

For Plane, my only comment is that maybe there should be another method “isclose” that does the plane comparisons but also allows atol and rtol values to be specified as users may want to change them.

That’s good things to do! I have added Plane.isclose method and parameters to control the torelances from FreeSurface.unique_shifts.

With FreeSurface, should there be a variable for the trial image range? int with default value of 1?

I have added trial_image_range parameter. BTW, I cannot come up with another way to find translationally equivalent planes. There may be a cleverer way not to use heuristics like trial_images.

lan496 commented 2 years ago

It’s also curious how I seem to be doing effectively the same operation three different ways - NDArray.dot, np.inner with transpose, and sum over np.outer.

This is a fun exercise! In my environment, np.dot is the fastest although all functions are so fast with small arrays.

import numpy as np
vects = np.random.rand(3, 3)
pos = np.random.rand(3)

def inner(vects, pos):
    return np.inner(pos, np.transpose(vects))

def dot(vects, pos):
    return np.dot(pos, vects)

def einsum(vects, pos):
    return np.einsum('ia,i->a', vects, pos)

%timeit inner(vects, pos)
> 2.41 µs ± 6.58 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit dot(vects, pos)
> 1.21 µs ± 2.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit einsum(vects, pos)
> 2.55 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

lmhale99 commented 2 years ago

For the three dot product operations, I was also getting the array.dot() to be slightly faster than the others for small arrays. Interestingly, it did seem like np.outer() with ellipsis summing and reshaping was faster for high dimensional arrays, but I doubt many people would be doing that. With array.dot() also being the most concise, I started to clean up the related operations to use that.

With the Miller tools, I also saw that the plane_crystal_to_cartesian was only supporting a single set of plane indices, so I modified it to also support arrays of indices like the other operations.

Box now has the miller conversions and updated forms of scale and unscale from System. I went with calling the updated methods position_relative_to_cartesian and vice versa. The system methods now call the Box methods and are marked with future depreciation warnings.

As for the trial images, the only other option I can think of is some normalization scheme to move all image shifts inside the box, but even then replicas may need to be tested if the shifts are exactly the lattice dimensions.

Tests and Jupyter docs still need to be added/updated for my changes...

lmhale99 commented 2 years ago

I started with making inverse_vects a private variable to avoid it being confused with reciprocal cells. Peeking at ase and pymatgen, it appears that reciprocal_vects is just the transpose of the inverse_vects, so I made it a property. Not sure why ase uses pinv as inv seems to be equivalent but faster for all vects I tested. Hopefully I'm not missing anything important.

reciprocal_vects gets computed and stored the first time it is accessed, and is reset whenever vects is updated. This minimizes the number of times it gets computed. I saw roughly 7 microseconds of savings per call with the value saved, so it might be overkill.

lmhale99 / atomman

Add `unique_shifts` method for `atomman.defects.FreeSurface` #26