I think the library should be centered around a MOF class that has different descriptors as cached properties.
Many descriptors rely on structure graphs, those should only be computed once.
There should also be a basic CLI to featurize one or a folder of CIFs
We should re-implement anything that already is implemented in matminer and we should also not couple it such that we need to update the package all the time if they change the API of matminer
We at least want to have the following descriptors built in:
RACs, but we should rely on molsimplify as it is hard to install and we can probably just extract the main ideas into this package (problem is also that it is GPL licensed)
Pore properties with zeo++ (that means we need to also make it available via conda)
persistent homology fingerprints: images and barcodes
energy histogram from bucior/snurr
property labeled RDF
the local-structure order parameters
basic summary of chemistry (perhaps also split into linker/node/...)
SOAP
The architecture should also keep in mind that we might want to add descriptor based on the building blocks at some point, so the design should allow for making this easy
Random dump of thoughts:
I think the library should be centered around a
MOF
class that has different descriptors as cached properties.Many descriptors rely on structure graphs, those should only be computed once.
There should also be a basic CLI to featurize one or a folder of CIFs
We should re-implement anything that already is implemented in matminer and we should also not couple it such that we need to update the package all the time if they change the API of matminer
We at least want to have the following descriptors built in:
The architecture should also keep in mind that we might want to add descriptor based on the building blocks at some point, so the design should allow for making this easy