Closed Sandy4321 closed 4 years ago
Hi @Sandy4321 thank you for participating in the project. This repo belongs to a course on feature selection. So I think this issue is best placed in my other repo, Feature-engine. I believe you open an issue there already. Thank you.
Is your feature request related to a problem? Please describe. if we have categorical features how to created new features by all features combinatoric combination since in real life categorical features are NOT independent , but many of them are dependent from each to others
even scikit learn can not do, but you will?
related to PacktPublishing/Python-Feature-Engineering-Cookbook#1 Describe the solution you'd like for example maximum number of combined features is given: or 2 or 4 or 5
for pandas DF you can use concatenation https://stackoverflow.com/questions/19377969/combine-two-columns-of-text-in-dataframe-in-pandas-python
columns = ['whatever', 'columns', 'you', 'choose'] df['period'] = df[columns].astype(str).sum(axis=1)
so three features combinations from 11 features features combinatoric combination seems to be 3 nested loops are not good for this for i in range(1,11) for j in range(i+1,11) for k in range(j+1,11)
you need to get 165 new features from all combinations (not permutations ) then you get many new features
" Another alternative that I've seen from some Kaggle masters is to join the categories in 2 different variables, into a new categorical variable, so for example, if you have the variable gender, with the values female and male, for observations 1 and 2, and the variable colour with the value blue and green for observations 1 and 2 respectively, you could create a 3rd categorical variable called gender-colour, with the values female-blue for observation 1 and male-green for observation 2. Then you would have to apply the encoding methods from section 3 to this new variable ."