solegalli / feature-selection-for-machine-learning

Code repository for the online course Feature Selection for Machine Learning
https://www.courses.trainindata.com/p/feature-selection-for-machine-learning
Other
302 stars 337 forks source link

how to created new features by all features combinatoric combination #1

Closed Sandy4321 closed 4 years ago

Sandy4321 commented 4 years ago

Is your feature request related to a problem? Please describe. if we have categorical features how to created new features by all features combinatoric combination since in real life categorical features are NOT independent , but many of them are dependent from each to others

even scikit learn can not do, but you will?

related to PacktPublishing/Python-Feature-Engineering-Cookbook#1 Describe the solution you'd like for example maximum number of combined features is given: or 2 or 4 or 5

for pandas DF you can use concatenation https://stackoverflow.com/questions/19377969/combine-two-columns-of-text-in-dataframe-in-pandas-python

columns = ['whatever', 'columns', 'you', 'choose'] df['period'] = df[columns].astype(str).sum(axis=1)

so three features combinations from 11 features features combinatoric combination seems to be 3 nested loops are not good for this for i in range(1,11) for j in range(i+1,11) for k in range(j+1,11)

you need to get 165 new features from all combinations (not permutations ) then you get many new features

" Another alternative that I've seen from some Kaggle masters is to join the categories in 2 different variables, into a new categorical variable, so for example, if you have the variable gender, with the values female and male, for observations 1 and 2, and the variable colour with the value blue and green for observations 1 and 2 respectively, you could create a 3rd categorical variable called gender-colour, with the values female-blue for observation 1 and male-green for observation 2. Then you would have to apply the encoding methods from section 3 to this new variable ."

solegalli commented 4 years ago

Hi @Sandy4321 thank you for participating in the project. This repo belongs to a course on feature selection. So I think this issue is best placed in my other repo, Feature-engine. I believe you open an issue there already. Thank you.