Open ml-evs opened 3 years ago
Hi! Was this implemented or is it still pending?
Hi! Thanks for pulling this up. This hasn't been implemented but can easily be added manually. Small example:
from modnet.featurizers.presets import DeBreuck2020Featurizer
from modnet.preprocessing import MODData
from pymatgen.core import Composition
materials = [Composition("SiO2"),Composition("Li20")]
class CompositionOxidFeaturizer(DeBreuck2020Featurizer):
def __init__(self):
super().__init__()
self.structure_featurizers = ()
self.site_featurizers = ()
data = MODData(materials=materials, featurizer=CompositionOxidFeaturizer())
data.featurize()
print(data.df_featurized)
This will add 9 extra features compared to the current default one. Let me know if this helps. I might try this with some of the matbench tasks and implement this (maybe default).
I will just add one extra bit to @ppdebreuck's suggestion, which is to add the fast_oxid
attribute to the featurizer, otherwise pymatgen (the underlying library matminer uses for oxidation states) will try to allow for every site in a structure to have a different oxidation state which scales horribly with the number of sites. Doing something like:
class CompositionOxidFeaturizer(DeBreuck2020Featurizer):
def __init__(self):
super().__init__()
self.structure_featurizers = ()
self.site_featurizers = ()
self.fast_oxid = True
will use more sensible defaults for oxidation state calculations that you can then test: https://github.com/ppdebreuck/modnet/blob/06ca45f104a697e65d02d5405d26aec78f712972/modnet/featurizers/featurizers.py#L161-L170)
(also see the relevant code in matminer/pymatgen)
Thank you for your inputs!
so if I understand correctly, with this class we just add features, and the "base" featurizer is launched with data.featurize()
later on.
By the way, since this would be a composition-only task, why are there self.structure_featurizers = ()
and self.site_featurizers = ()
? Is it just needed to define them as empty?
Thank you for your inputs! so if I understand correctly, with this class we just add features, and the "base" featurizer is launched with
data.featurize()
later on.
Yep, the base featurizer just knows to take each listed matminer featurizer in turn and apply it to the relevant sites, structure or composition.
By the way, since this would be a composition-only task, why are there
self.structure_featurizers = ()
andself.site_featurizers = ()
? Is it just needed to define them as empty?
Yeah, this is just convenience/laziness. You could instead inherit from the base featurizer and list out all of the matminer featurizer classes you wanted to use, but as we have them all already in the DeBreuck2020 preset it is easier base on that and disable the ones we don't want.
At some point the IonProperty
featurizer in matminer started using oxidation states (maybe it always did). We never ran into an issue with this as the poor scaling was probably being masked by the time taken for other featurizers, or we were only using it on simple compositions. As shown in #126, the featurizers basically hang if a sufficiently complex composition is attempted.
Possible workarounds:
IonProperty
into the oxid_composition_featurizer
list instead (so it will no longer be used unless there is a structure).fast_oxid
for the composition-only featurisers, as suggested above.
As above. We should probably add the oxidation state featurizers used in the
DeBreuck2020Featurizer
to theCompositionOnly
featurizer too.