yoshida-lab / XenonPy

XenonPy is a Python Software for Materials Informatics
http://xenonpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
136 stars 59 forks source link

Add feature selection for "BaseDescriptor" class #47

Closed TsumiNa closed 5 years ago

TsumiNa commented 5 years ago

We designed the BaseDescriptor as a container of BaseFeaturizer calculators. By using this, user can batch a lot of featurizers as a preset for pipelining. This proposal for add feature selection function to BaseDescriptor class.

Proposal

Assume we have some class like this:

class BaseDescriptor:
    def __init__(sefl, *, featurizers='all'):
        ....

class NewDescriptor(BaseDescriptor):
    def __init__(sefl, *, featurizers):
        super().__init__(featurizers=featurizers)
        ....

        sefl.input = Featurizer1()
        sefl.input = Featurizer2()
        sefl.input = Featurizer3()

descriptor = NewDescriptor()

In this case, for the input has column named input, descriptor will calculate all features that associate with self.input then concatenate them. This is exactly what we did in current version.

In this proposal, user can initialize the NewDescriptor with a parameter called featurizers. This parameter contains the name of features. Only the featurizer which have name in the featurizers will be activated.

In following example, only the specific features 'Featurizer1' and 'Featurizer3' will be calculated.

descriptor = NewDescriptor(featurizers=['Featurizer1', 'Featurizer3'])