rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.86k stars 857 forks source link

Add method to extract features in PCA-family algorithms #330

Open sashml opened 6 years ago

sashml commented 6 years ago

Is that possible to add a method to PCA-algorithms to extract the important features?

I like the approach for classical PCA, but struggle how to adapt to methods from mlxtend library

def get_important_features(transformed_features, components_, columns):
    """
    This function will return the most "important"
    features so we can determine which have the most
    effect on multi-dimensional scaling
    """
    num_columns = len(columns)

    # Scale the principal components by the max value in
    # the transformed set belonging to that component
    xvector = components_[0] * max(transformed_features[:, 0])
    yvector = components_[1] * max(transformed_features[:, 1])

    # Sort each column by it's length. These are your *original*
    # columns, not the principal components.
    important_features = {columns[i]: math.sqrt(xvector[i] ** 2 + yvector[i] ** 2) for i in
                          range(num_columns)}
    important_features = sorted(zip(important_features.values(), important_features.keys()),
                                reverse=True)[:k_best]
    print "Features by importance:\n", important_features
    return [feat[1] for feat in important_features]

https://stackoverflow.com/questions/48844159/how-to-extract-columns-names-from-rbfkernelpca?noredirect=1#comment84692998_48844159 - or is that correct?

rasbt commented 6 years ago

Hi, there,

sth like this only works for the regular PCA not kernel PCA (or at least, I am not aware of that). E.g., have a look at the Factor Loadings section of the regular PCA: http://rasbt.github.io/mlxtend/user_guide/feature_extraction/PrincipalComponentAnalysis/#example-4-factor-loadings