khuyentran1401 / machine-learning-articles

List of interesting articles on different topics of machine learning and deep learning
https://towardsdatascience.com/how-to-organize-your-data-science-articles-with-github-b5b9427dad37?source=friends_link&sk=4dfb338164ad6e95809d943f0dc0578e
163 stars 54 forks source link

Data Transformation and Feature Engineering #119

Open UKVeteran opened 2 years ago

UKVeteran commented 2 years ago

TL;DR

Article Link

https://towardsdatascience.com/data-transformation-and-feature-engineering-e3c7dfbb4899

Author

Destin Gong

Key Takeaways

Why need data transformation?

Useful Code Snippets


## data scaling methods ##
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import RobustScaler
scale_var = ['Enrollment_Length', 'Recency', 'NumStorePurchases', 'clipped_Age', 'clipped_NumWebVisitsMonth']
scalers_list = [StandardScaler(), RobustScaler(), MinMaxScaler()]
for i in range(len(scalers_list)):
    scaler = scalers_list[i]
    fig = plt.figure(figsize = (26, 5))
    plt.title(scaler, fontsize = 20)
    for j in range(len(scale_var)):
        var = scale_var[j]
        scaled_var = "scaled_" + var
        model = scaler.fit(df[var].values.reshape(-1,1))
        df[scaled_var] = model.transform(df[var].values.reshape(-1, 1))
sub = fig.add_subplot(1, 5, j + 1)
        sub.set_xlabel(var)
        df[scaled_var].plot(kind = 'hist')

Useful Tools

Comments/ Questions

khuyentran1401 commented 2 years ago

Thank you for contributing! This looks very useful!