kjappelbaum / mofdscribe

An ecosystem for digital reticular chemistry
https://mofdscribe.readthedocs.io/en/latest/
MIT License
43 stars 7 forks source link

Primitive unit cell creation in featurization methods #304

Closed Andrew-S-Rosen closed 2 years ago

Andrew-S-Rosen commented 2 years ago

Currently, it does not seem that the featurizers (at least RACS) create a primitive unit cell before running the featurization. For MOFs in particular, this can pose a problem if the user is not acquainted with how to create primitive cells (NU-1000's Niggli-reduced primitive cell takes 50s for the RAC features to be created on my laptop, but the conventional cell was running for 5+ min before I cancelled it).

My proposal would be to add a kwarg to the __init__ of the featurizer that's something like primitive: bool = True, which will internally transform the structure to its Niggli-reduced primitive cell via structure.get_primitive_cell() when featurize() is called. Or we could just do it automatically without a kwarg.

Thoughts? The messy thing is that this would probably need to be carried out for any featurization method that is invariant with respect to periodicity and unit cell shape/size, so we should have a consistent mechanism.

Edit: In hindsight, this should only really be done when the featurization method involves the operations on the structure, e.g. making a structure graph. We don't want to waste time making a primitive cell in things based on the composition like PriceLowerBoud().

kjappelbaum commented 2 years ago

Thoughts? The messy thing is that this would probably need to be carried out for any featurization method that is invariant with respect to periodicity and unit cell shape/size, so we should have a consistent mechanism.

Yes, good points:

Andrew-S-Rosen commented 2 years ago

I like that idea. I think a single BaseMOFFeaturizer that can be called is the right way to go.