Open sgbaird opened 2 years ago
Following up from a chat with @kjappelbaum, there could be a hierarchy of building block types manifested in the layers. For example, the first layer encodes information about the atoms, the second layer encodes information about structural motifs (larger building blocks), etc.
From internal communication. By Berend Smit:
I really like this suggestion. This actually points to a common issue with many materials informatics repositories. For a while, I've wanted to make CrabNet agnostic to chemical formulas https://github.com/sparks-baird/CrabNet/issues/6. @Pepe-Marquez is also interested in featurization for more general building blocks based on some internal discussions I've had with him.
To implement a really general "building blocks" framework seems non-trivial to me, at least at first. I think the common threads here would be that
site_encoding_func
-s andembedding_encoding_func
-s would operate on pymatgenStructure
-s, and thesite_decoding_func
-s andembedding_decoding_func
-s would operate on images, where each row/column represents a unique building block. In the latter case, the currentxtal2png
representation starts to break down since it contains site coordinates. For arbitrary building blocks (e.g. of structural motifs), additional (invertible) information related to the composition and structure of the motifs would need to be present.