Generalization to building blocks rather than only atoms

From internal communication. By Berend Smit:

Coming from the perspective of MOFs I see one main point as a potential opportunity for the library:

If one could abstract the encoding of the image a bit more from atoms as fundamental building blocks one could apply it also to MOFs (or some coarse-grained representation).

That is, the most general implementation would have an interface such as
encode(structure, site_encoding_func, embedding_encoding_func) -> image array
decode(image,  site_decoding_func, embedding_decoding_func) -> structure
By default, the functions would to the encoding of the elements. However, if users provide other sites/or use symbols to indicate certain building blocks, they might want to choose their own encoding/decoding function. This should also make it easier to use Wyckoff sites instead of all sites.

The embedding_encoding_func would be a function that by default creates the pairwise distance matrix, but might also be the adjacency matrix (which can be useful if one aims to generate new crystallographic nets).

Another interesting question might be how materials cluster in "xtal2png" space compared to other representations, e.g. SOAP. However, this would require the implementation invariant to permutation and supercell expansion.

I really like this suggestion. This actually points to a common issue with many materials informatics repositories. For a while, I've wanted to make CrabNet agnostic to chemical formulas https://github.com/sparks-baird/CrabNet/issues/6. @Pepe-Marquez is also interested in featurization for more general building blocks based on some internal discussions I've had with him.

To implement a really general "building blocks" framework seems non-trivial to me, at least at first. I think the common threads here would be that site_encoding_func-s and embedding_encoding_func-s would operate on pymatgen Structure-s, and the site_decoding_func-s and embedding_decoding_func-s would operate on images, where each row/column represents a unique building block. In the latter case, the current xtal2png representation starts to break down since it contains site coordinates. For arbitrary building blocks (e.g. of structural motifs), additional (invertible) information related to the composition and structure of the motifs would need to be present.

sparks-baird / xtal2png

Generalization to building blocks rather than only atoms #200