I've made a number of revisions to the figs and addressed your questions:
all figs: subtitle font is smaller. do you think it's still too big?
fig 1
removed "RerFdn"
plotted true class posteriors "f_Y|X" w/ different colormap and colorbar scale
fig 2
panels C and F: RerF was running significantly slower than the others because of the way it was counting the total number of split variables for each tree. I removed this code completely as it isn't a necessary thing to do. I was doing it because in the past we had talked about using it as a measure of complexity.
panels B and E: added Bayes error. the errors for Rotation RF might be wrong - it appears to be lower than Bayes in panel E. i'm still trying to resolve this.
fig 3
added Trunk in panels F-J
now each panel corresponds to a particular transformation. RerF does better than RF in each case but not better than RotRF in each case, particularly when the data is rotated. However, while RerF is still affected by rotation, panel G suggests that it isn't affected to the same extent as RF when d=10,50.
x axis in panels A-E are now log scale and only go up to d=50 (before it went up to 100) to make the lines easier to distinguish.
fig 4
this is for the real data
the y-axes start higher up. in the caption or in the paper we should probably mention the y-intercepts, as it indicates the proportion of times each classifier was the best.
i think we can remove the numbers on the xaxis and yaxis, right?
i'd label "(D) Posterior" maybe?
for the top colorbar, i'd only have about 4 numbers, like the bottom
i'd replace 'x1' with 'x_1' and 'x2' with 'x_2'
maybe make panel (D) the first panel?
fig 2
for B & E, can we plot L_X - L_RF, for X \in {RerF, RotRF}, ie, the relative error compared with random forest? i think that might be more clear. let's still clear up that RotRF is beating Bayes
i think this fig might be better transposed, so there are 2 columns (one per setting), then you could not repeat the titles 3x, and the ylabels 2x.
maybe trunk only needs to go up to 50?
maybe the yaxis on B & E should be log scale? not sure, it just doesn't look very impressive.
fig 3
i think nearly perfect! i'd remove Rerfd here, because we'll have a fig 5 with various bells & whistles
i can't tell whether log on xaxis is good here or not?
i don't see panels F-J
fig 4
why not make it 4 across, a la fig 3?
i'd remove RerFd here too
is there any way to make affine have more 'signal'
can we think of more semantic x & y labels?
outlier might be that we resample the data, but permute the indices, not really sure if that is an outlier?
fig 5
something that shows RerFd, rank RerFd, and maybe some other things? not sure what exactly it should be.
all relevant figs: standardize algorithm color scheme
subpanel labels (eg, (A)) can go in top left of each figure, because 'title' will now just be the title, and will be consistent across multiple panels.
put 'scenario' name on left side of each row.
fig 2
more iterations
RerF and RF up to 1000 for trunk
make panels B and C log scale
add RF in panels B & E
put scenario name on left side, rather than top
fig 3
remove legend
fig 4
change y-axis label to "proportion"
add second row of performance profiles for RF and RerF on (just remaining)
make legend indicating AUC for each algorithm
fig 5
Scatter plots of L_RF vs L_variant for all variants (RerF, RerFd, robust RerFd, RotRF) on untransformed benchmarks. (robust RerFd = rankRerFd?).