matloff / polyreg

180 stars 26 forks source link

just clarifying : categorical data features interaction #26

Open Sandy4321 opened 2 years ago

Sandy4321 commented 2 years ago

great code (as what you doing )

"An important feature is that dummy variables are handled properly, so that for instance powers of a dummy variable do not exist as duplicates of the original."

just clarifying given categorical data

f1 f2 f3 a c u b g x a k y

after features interaction will it be for row N1 ?

f1 f2 f3 f1 f2 f3 f1 f2 f1f3 f2f3 f1f2f3 a c u => a c u ac au cu acu

to make sure there will not be ua, ca , ... uca , etc

matloff commented 2 years ago

Thanks for the note.

Not sure what you mean? Have you tried constructing a small example and then calling getPoly()?

Norm

On Sun, Jul 24, 2022 at 11:53:18AM -0700, Sandy4321 wrote:

great code (as what you doing )

"An important feature is that dummy variables are handled properly, so that for instance powers of a dummy variable do not exist as duplicates of the original."

just clarifying given categorical data

f1 f2 f3 a c u b g x a k y

after features interaction will it be for row N1 ?

f1 f2 f3 f1 f2 f3 f1 f2 f1f3 f2f3 f1f2f3 a c u => a c u ac au cu acu

to make sure there will not be ua, ca , ... uca , etc

— Reply to this email directly, [1]view it on GitHub, or [2]unsubscribe. You are receiving this because you are subscribed to this thread. Message ID: @.***>

References

  1. https://github.com/matloff/polyreg/issues/26
  2. https://github.com/notifications/unsubscribe-auth/ABZ34ZI76MQPRQDNJD5FYP3VVWGJ3ANCNFSM54QCKMSA
Sandy4321 commented 2 years ago

I ask you confirm if it works properly with categorical values What behavior to expect For long time I am looking for some theoretically correct design for categorical values features interaction

Sandy4321 commented 9 months ago

by the do you remove NEW muti-collinear (similar ) FEATURES since due to original features multiplication many very similar columns may be detected