Primitive unit cell creation in featurization methods

Andrew-S-Rosen commented 2 years ago

Currently, it does not seem that the featurizers (at least RACS) create a primitive unit cell before running the featurization. For MOFs in particular, this can pose a problem if the user is not acquainted with how to create primitive cells (NU-1000's Niggli-reduced primitive cell takes 50s for the RAC features to be created on my laptop, but the conventional cell was running for 5+ min before I cancelled it).

My proposal would be to add a kwarg to the __init__ of the featurizer that's something like primitive: bool = True, which will internally transform the structure to its Niggli-reduced primitive cell via structure.get_primitive_cell() when featurize() is called. Or we could just do it automatically without a kwarg.

Thoughts? The messy thing is that this would probably need to be carried out for any featurization method that is invariant with respect to periodicity and unit cell shape/size, so we should have a consistent mechanism.

Edit: In hindsight, this should only really be done when the featurization method involves the operations on the structure, e.g. making a structure graph. We don't want to waste time making a primitive cell in things based on the composition like PriceLowerBoud().

kjappelbaum commented 2 years ago

Thoughts? The messy thing is that this would probably need to be carried out for any featurization method that is invariant with respect to periodicity and unit cell shape/size, so we should have a consistent mechanism.

Yes, good points:

I anyhow thought of having a BaseMOFFeaturizer that could have the logic for getting the primitive. Depending on the featurization approach we can then set the defaults in the constructor
You point also to runtime issues: There are some obvious things we could do to make it faster, but at some point it would be good to do a more careful profiling and look at flame graphs

Andrew-S-Rosen commented 2 years ago

I like that idea. I think a single BaseMOFFeaturizer that can be called is the right way to go.

kjappelbaum / mofdscribe

Primitive unit cell creation in featurization methods #304