Closed nikolasfritz closed 5 years ago
General structure:
Good choice of subchapters. Instead of putting the choice of intervals for first order ALE plots in 8.0.X, I would create a subchapter "8.1 Choice of Intervals for Continuous Features".
Intro:
Nice introduction to the issue of choosing the right number and size of intervals. The trade-off between small intervals and a sufficient amount of data points per interval was laid out well, including that a sufficiently high number of data points per interval is only necessary if there are effects of other features.
A genereal introduction to the chapter is missing. You give good intuition why intervals are an issue, but I would also write something about the theoretical ALE and the categorial features as an intro to the whole chapter.
I agree with Jakob. Right now you are only introducing the reader to the problem of choosing the right number and size of intervals. Also introducing the problem of creating ALE plots for piece-wise constant models and categorial features would give a more thorough introduction.
ALE Approximations:
Nice examples. Good theoretical foundation.
$\hat{f}_1
(x_1, x_2) = (x_1-4)(x_1-5)(x_1-6) + x_2^3$ is pretty long und uncomfortable to handle. -> Is it possible to show the same effect with a simpler polinoial of degree 3 (i.e. x_1^3 or (x_1^3)/5 to flatten it
out)for the formula $\hat{f}_1 (x_1, x_2) = (x_1-4)(x_1-5)(x_1-6)x_2^3$ the equations are very long and it is a bit uncomfortable to follow. Same as in Example 1 -> if it is possible to show the same effect with a shorter equation, it would be nice
I agree that one could show the same effects with more simple equations. However, you already put in a lot of effort into these examples, so in my opinion you can keep them and concentrate on creating more simulations.
Your explanation for the reletively bad fit of the estimated ALE (compared to the theoretical ALE) is that there are not enough data points in the crucial x1 area (between x1=7 and x1=10). For me it would be interesting to see if this problem will disappear when using 1000 or 10000 data points instead of 100. In my eyes this would proof your explanation
This would indeed be interesting to see.
Your planned plot 'which shows 50 ALE estimations (on different data samples)' is also a good idea, defenitely do that.
I agree that this would be an interesting simulation.
Piece-wise constant models:
A simulation with multiple piece-wise constant prediction functions and ALE plots with different intervals would be very interesting.
Categorial features:
Again, a simulation with two different orderings of features (for a fixed model!) would be very interesting. "1. Komogorov-Smirnoff distance or frequency tables for categorical features 2. multidimensional scaling"
Outlook:
Ideas for possible solutions are definitely a plus. You do not have to go in-depth as the extent of simulations - if implemented - will be pretty high. A small discussion is sufficient.
Here my review:
In general the structure of the chapter is fine. You also show pretty well how the approximation to the theoretical ALE depends on the data and the choice of intervals. The biggest issue in my eyes are the long equations, they are a bit inconvenient to follow. And I think it would be nice to give the examples a meaningful name, especially when they are real subtitles. Under the titles and subtitles of the chapter I made more detailed comments on each paragraph.
How to choose the number and/or length of the intervals
State of the art
ALE Approximations
Example 1
Example 2
Example 3
Problems with piece-wise constant models
Example 4
Outlook
Categorical Features
Ordering the features
Example of ALE with categorical feature
Interpretation
Changes of the ALE due to different orders
Example
Here I guess in the first place artifical data can be unseful. To sum this subchapter up, it could be nice to try another order for the above used example.