Feature importance with NODE

manujosephv / pytorch_tabular

A standard framework for modelling Deep Learning Models for tabular data

https://pytorch-tabular.readthedocs.io/

MIT License

1.36k stars 136 forks source link

Feature importance with NODE #96

Closed SalvatoreRa closed 1 year ago

SalvatoreRa commented 2 years ago

Hi,

it is a very interesting project and I was trying different models that are described. I was trying to compare LightGBM (or XGBoost) with NODE. I would like to know how you can extract the most important feature of the model. there is a way? how you can perform feature selection?

thank you

hsuyab commented 2 years ago

Forgive my ignorance but @SalvatoreRa what is NODE?

SalvatoreRa commented 1 year ago

https://github.com/manujosephv/pytorch_tabular/blob/main/examples/classification_with_NODE.ipynb Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data One of the example in the repository here is NODE. I am interested in extract the most important feature when using one of the tabular models for classification

manujosephv commented 1 year ago

Sorry for the late reply. @SalvatoreRa Unfortunately, NODE doesn't have these "feature importance". FTTransformer is a model which has this and you can try that out.

@hsuyab NODE is a kind of tabular model - Neural Oblivious Decision Trees.

hsuyab commented 1 year ago

@manujosephv Thanks, I was confused with Neural ODE :P

SalvatoreRa commented 1 year ago

Sorry for the late reply. @SalvatoreRa Unfortunately, NODE doesn't have these "feature importance". FTTransformer is a model which has this and you can try that out.

@hsuyab NODE is a kind of tabular model - Neural Oblivious Decision Trees.

thank you for the reply

I thought that NODE since are composed by tree they had something similar to XGBoost or random forest which are taking in account how many times the feature is used

let's say I have trained one FTTransformer, how I can extract the feature importance?

manujosephv commented 1 year ago

It's not exposed freely because only a very few models support it and not extensively tested yet.

But if the trained model is called trained_model, you can access the feature importance by trained_model.model.feature_importance(). It will return a pandas dataframe with importances.