Open aerdem4 opened 4 years ago
@aerdem4 so, are you only looking for a gpu-accelerated SymbolicTransformer
?
@teju85 I think all of them are the same except the metric. Multiple options for the metric would be nice but spearman is the most useful.
Alright, whose idea of a joke was it to tag this with Good First Issue? I'm looking at you @wxbn ! ;)
@aerdem4 we are going to have an intern provide us with an initial implementation of this in cuML! For starters, can we assume max program AST depth of 10 or so? Or do you think that's too low to begin with? In practice, what's the deepest program you've come across?
@teju85 thanks for the good news! I think 10 is enough for AST depth. Generated features don't need to be very complex but should capture the interactions the model can't. If the intern needs any help, I would be happy to be involved btw.
tagging @vimarsh6739 who'll be implementing this.
A simple Kaggle test case: https://www.kaggle.com/c/loan-default-prediction This dataset has 800 features. People claim that without extracting the feature f527-f528, GBM performs poorly in this old competition. There may be more complex magic features too.
I can also create artificial datasets that we can test if GP can reverse engineer the features that contribute to the target.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Is your feature request related to a problem? Please describe. Genetic Programming is very useful for feature engineering but main challenge is its time complexity. Luckily, they are easily parallelizable. Therefore, I believe it is a good fit for cuML.
Example: Let's assume you have 2 columns A and B, and a binary target. This target is 1 most of the time when A > B. It is very difficult to learn it with a tree based model but GP can engineer this feature for you.
Describe the solution you'd like I would like to have the functionalities of gplearn accelerated on GPU. (https://gplearn.readthedocs.io/en/stable/)