wildtreetech / advanced-comp-2017

💻 Material for a course on applied machine-learning for scientists. Taught at EPFL in spring 2017
23 stars 13 forks source link

Question on course: "Useless" variables, decision tree vs neural networks #5

Open david-droz opened 7 years ago

david-droz commented 7 years ago

Hello,

I had a follow-up question to today's discussion, although it may be covered in the next lecture

Today we saw that for a decision tree / random forest, it is best to not have "useless" variables, i.e. variables that offer little or no discriminating power. Therefore if we want to implement such an algorithm, we have to study beforehand the input variables and remove the useless ones. Right?

What about deep neural networks? At the end of the course, you mentioned how deep learning works best with raw data instead of high level features. Can we conclude that it "ignores" useless variables? Or would a large number of useless variables skew the training and result on e.g. overtraining?

betatim commented 7 years ago

For trees you can add uninformative features and they will not be used. Where it gets tricky is if you use for example a random forest where you only use a subset of the features. In the case of a large number of uninformative variables you run the risk that each individual tree only sees uninformative variables. Say you have 8 uninformative and 2 informative variables, select 5 at random from those ten and you have a high chance of not having the informative ones in your selection.

You can also turn this around and use a decision tree to find out which ones are the informative features: http://scikit-learn.org/stable/modules/feature_selection.html#tree-based-feature-selection is a nice starting point for that (also check the linked examples)

For neural networks I'd have to do a bit of research. For example in an image classification problem: what happens if you add a new part to the image which is uninformative with respect to predicting the class, not sure what "should" happen from a theoretical point of view. My guess would be that it would have little effect on the kernels learnt in the convolutional layers and then get's ignored in the fully connected part.

betatim commented 7 years ago

Actually I think current researchers would probably point you towards what is called the "attention" mechanism. Some (heavy) reading on that https://arxiv.org/abs/1502.03044 (with some pictures). There are also some illustrations in http://distill.pub/2016/augmented-rnns/ (but I haven't read that article)

david-droz commented 7 years ago

Thank you!

betatim commented 7 years ago

let's keep the issue open to make it more discover-able by others. We can go through all issues and close them at the end of the course.