Closed jakobBodensteiner closed 5 years ago
Here my review:
The general structure looks fine. For both chapters, the 1D and the 2D comparison you could write subchapters, which deal with categorical features (Maybe you already thought about that). Anyway dont forget examples with categorical features.
The "Estimation 2D ALE"-chapter should include a general introduction for the 2D ALE (not only the estimation formula) as the topic was not treated in the ALE-intro section. Depending on the size of this introduction we should then discuss wether this chapter is the right place or we should move it to the ALE-intro chapter.
Runtime comparison: You only compare PDP to the ALE implementation of the iml-package or also to the implementation of the ALEplot package? Why won't you compare runtime for 2D categorical features? Obviously i know the answer, just for sake of completeness: There isnt a 2D categorical implementation within the ALEplot package neither, is there?
Weakness of ALE: The theoretical example with a prediction function that only sees one feature is nice to get a better understanding why ALE-Plots can be very bad in poor data situations. Anyway in that case the big advantage of ALE, that it can deal with correlations is irrelevant. Maybe for didactic purposes it would be better to find an example with poor data, correlated features and a prediction function that sees both features, where still due to very bad data the PDP is superior to the ALE-plot.
General structure:
Good choice of subchapters. Maybe give the 'Weakness of ALE' subchapter a different name, e.g., 'Comparison for unevenly distributed data'.
"For both chapters, the 1D and the 2D comparison you could write subchapters, which deal with categorical features (Maybe you already thought about that). Anyway dont forget examples with categorical features."
Including examples with categorial examples is definitely a plus. You do not have to go in-depth here. An ALE and PD plot on the same data with a categorial feature is sufficient.
The "Estimation 2D ALE"-chapter should include a general introduction for the 2D ALE (not only the estimation formula) as the topic was not treated in the ALE-intro section. Depending on the size of this introduction we should then discuss wether this chapter is the right place or we should move it to the ALE-intro chapter."
Good point. A theoretical introduction to the 2D ALE is definitely a plus and worth moving to the ALE intro chapter.
Runtime comparison:
You only compare PDP to the ALE implementation of the iml-package or also to the implementation of the ALEplot package? Why won't you compare runtime for 2D categorical features? Obviously i know the answer, just for sake of completeness: There isnt a 2D categorical implementation within the ALEplot package neither, is there?
I would argue that comparing the runtime of implementations of ALE and PDP in the iml package is sufficient. To the best of my knowledge there is no 2D categorial implementation in the ALEplot package.
Weakness of ALE:
The theoretical example with a prediction function that only sees one feature is nice to get a better understanding why ALE-Plots can be very bad in poor data situations. Anyway in that case the big advantage of ALE, that it can deal with correlations is irrelevant. Maybe for didactic purposes it would be better to find an example with poor data, correlated features and a prediction function that sees both features, where still due to very bad data the PDP is superior to the ALE-plot.
This subchapter raises a very interesting point: The shortcomings of ALE plots with unevenly distributed data. I would name this chapter accordingly, e.g., 'Comparison for unevenly distributed data'. I agree with Nikolas with regards to creating more examples, i.e., simulating data with correlated features, training multiple models on all features and seeing whether the PDP still beats the ALE in terms of approximating the real function (also try increasing the number of ALE intervals).
It is also worth mentioning that the advantage of the PDP in this case stems from using a grid of data points on the input space. Unfortunately, there is always a risk of extrapolation with this method. Another commonly used variant to compute the PDP is to only use observed data points. Using the latter variant, the risk of extrapolation is a little bit smaller, although we may still extrapolate by combining observed feature values in new ways. However, when only using observed feature values and not an evenly distributed grid, the advantage of the PDP in this case should disappear. Make sure to mention this problem when discussing potential advantages of the PDP.
Including examples with categorial examples is definitely a plus. You do not have to go in-depth here. An ALE and PD plot on the same data with a categorial feature is sufficient.
I thought of that but at this point in the book the ALE for categorial features is not yet introduced. (I mean with the ordering and stuff) This happens in Nikolas` chapter. I can imagine that it will lead to confusion. I mean I can do it and I write something like for more details look at Nikolas' chapter. Or we move that part of Nikolas' chapter with categorial features to the introduction part as well.
Good point. A theoretical introduction to the 2D ALE is definitely a plus and worth moving to the ALE intro chapter.
I can do that. So I would write it now directly into chapter 02-00-ale.rmd right?
This subchapter raises a very interesting point: The shortcomings of ALE plots with unevenly distributed data. I would name this chapter accordingly, e.g., 'Comparison for unevenly distributed data'. I agree with Nikolas with regards to creating more examples, i.e., simulating data with correlated features, training multiple models on all features and seeing whether the PDP still beats the ALE in terms of approximating the real function (also try increasing the number of ALE intervals).
I will try to find something.
I thought of that but at this point in the book the ALE for categorial features is not yet introduced. (I mean with the ordering and stuff) This happens in Nikolas` chapter. I can imagine that it will lead to confusion. I mean I can do it and I write something like for more details look at Nikolas' chapter. Or we move that part of Nikolas' chapter with categorial features to the introduction part as well.
I agree. It is fine to leave out categorial features here as Nikolas talks about it in his subchapter.
I can do that. So I would write it now directly into chapter 02-00-ale.rmd right?
Yes, you can do that.
Almost only notes