Closed zmjones closed 9 years ago
@larskotthoff You cannot make the bounds tighter. They either are, or they are not.
My main point is: Do we understand what these bounds say? Otherwise we should not implement them. Because we need to explain them to the user.
I am on the fence here. Whether I wanna see this or not.....
If you change the quantiles, the distance to the median will change. What the plot Zach posted seems to show is that the particular feature isn't very informative wrt the prediction. With the bounds, you can see that the variation within the bounds (i.e. what other features account for) is much larger than the variation caused by the feature itself.
If you change the quantiles, the distance to the median will change.
Yes, I know that. But you simple plot a different thing then. The bounds change in the plot. But the distributional spread is the same....
Like I said, I still might wanna see this. I guess if you can see how large the variation across all Ys is, and if this is reduced by conditioning on X = 3, this might be informative?
Not sure what you're saying -- that sounds like a different kind of analysis? The advantage of having the bounds in the plot I can see is that it gives you an idea of the "in-feature" variation vs. "out-feature" variation.
I think you both agree. I don't think it is hard to implement this, though I haven't done survival outcomes yet. maybe the default should just be that we only use a location measure?
That sounds reasonable.
Ok this is done and seems to work well. I have a few questions. Are the only two possibilities for predict.type
"response" and "prob"?
For survival tasks. Is it the case that all of them generate a response
column as a result of predict, and that this is numeric? Is it always the hazard? If not I'd need a general way of figuring out what the output is.
I can't remember what we decided to do about classification w/o probabilities and clustering. I have passed as the aggregation function for classification w/o probabilities table(x) / length(x)
which creates plots which are just aggregated versions of the probability plots. I guess this way then has no advantages over the probability plots then (in the case with many classes which gives cluttered plots).
Thanks. I'm having a look now.
Are the only two possibilities for predict.type "response" and "prob"?
There is also "se"
for regression learners (which just gives standard deviations additional to the response).
About survival outputs: I'm pretty sure it's always numeric and some kind of risk. @mllg: Could you please clarify?
can't remember what we decided to do about classification w/o probabilities and clustering. I
For the partial dep plots? We decided not to do it now / focus on it. If the learner does not give you probs, just throw an exception.
Are the plots intended to work for factor features (like chas
in bh.task
)?
Are the plots intended to work for factor features (like chas in bh.task)?
Partial dependency? Would be nice.
@schiffner yes. my implementation probably doesn't work for that now but it will.
@berndbischl is what I have in the gist (the second example) not worth it at all?
is what I have in the gist (the second example) not worth it at all?
Just saying we decided. But we also said nobody is holding you back to try it out. I looked at your gist. IMHO it looks good and could be informative, so I dont see the need to remove it now :)
It now works with factor features, survival outcomes, and multiple features. I added a facetting argument to the plot which works very well when the second feature is a factor and is passable when it is not. I am curious what people think about the facetting solution for plotting a 2nd feature dimension.
My original plan was to not have a separate plotPartialPrediction
function but instead just plug the output of generatePartialPredictionData
into plotLearnerPrediction
, which could be added to. Should I instead continue to develop plotPartialPrediction
?
Thanks, that looks great! For the facetting, I would prefer to keep the axis ranges the same throughout to make the differences clearer (see in particular the last example).
Regarding your question, what do you think makes more sense? I would try to keep it simple and not have several functions take the output of the data generation function unless there's a specific reason to do that.
Good point, fixed.
So I think the plotting tasks are very similar if not the same. In both cases you have 1-2, maybe 3 feature dimensions; only you've gotten there by different means. Separating them only makes sense if they have really different purposes. Maybe they do. plotLearnerPrediction
seems to me only useful from a pedagogical perspective now.
I did think I could just give the output from generatePartialPrediction
the prediction
class plus some extra structure but that doesn't make sense bc there is no "truth" that we have in this case.
plotLearnerPrediction seems to me only useful from a pedagogical perspective now.
You mean to quickly try out a method and learn how its decision boundary looks like? Actually only for that purpose it was invented, so I could show it to students in class quickly :)
Yea that is what I meant.
Maybe we could just rename the current plotLearnerPrediction
to plotExampleLearnerPrediction
or similar, and take some of the plotting code from that and make a new plotLearnerPrediction
. But idk if this is a good idea, just a proposal. If we have separate plotting functions for everything we will definitely end up with some duplication, but maybe we'd avoid a monster.
Well, do you envisage what you're implementing to eventually provide the same functionality? If so, I'd just leave the implementation alone for now and eventually drop it.
Yes but we'd want the name to be different to use it for anything other than output from generatePartialPredictionData
. I'll just keep working on it as is then.
the past few days i've been working on some suggestions bernd made prior to his presentation to some psychologists. i think i've finished them except for dropping the negative class (or should it be the less frequent class?) from a partial plot with binary classification, which i will do today probably.
i've also been reviewing bernd's student's pr (bernd gave me commit rights just for this, i'll still be issuing PRs in the normal fashion). i thought he was done and started cleaning it, but apparently he is still working on some things.
the rest of this week i'm going to try to finish the following
when I get done with that i could start on a couple of things (also your suggestions)
of course i'd like to do all of these by the end of the summer
im except for dropping the negative class (or should it be the less frequent class?
You should display wrt to the class marked "positive" in the task IMHO. As mlr in general in probabilities refers to that, if you only want one value and so on.
when I get done with that i could start on a couple of things
I love what you list. and it also reflects my order of importance.
Sounds good. I remember that we talked about a few other things as well, e.g. refactoring plotLearnerPredictions()
and having 3D plots to allow to show more partial dependencies. I think it would also be great to integrate dimensionality reduction techniques (e.g. PCA) into the partial dependencies stuff.
What you've done so far is great, and I think it would be good to continue along those lines for a bit to get something even more comprehensive. The other things you've mentioned would be very useful as well, but are quite different from what you've worked on so far.
Oh and I'd add writing something for R bloggers or similar to the list, probably after finishing up with the viz stuff but before moving on to something else.
@berndbischl ok that sounds good on the positive class
@larskotthoff yes i definitely like the dimensionality reduction stuff. we need to talk more about it for me to get 100% what the plan is on that. i like that idea better than 3d plots, which i find sort of unwieldily.
so it sounds like the se estimation stuff and the website rewrite should go to the bottom.
for the mlr/caret comparison then i can put it on my website (and list the authors) and put it on R-Bloggers that way (I already have RSS) unless there is a better alternative.
if we start to disagree on the list we should talk on hangout some time.
Sounds like that would be a good idea anyway. I'm pretty much free today and the rest of the week.
I have some time problems currently to be honest. I could talk thursday maybe. But I have to say I really like Zach's list in the post above.
I will add a few more comments to what you want above Lars:
3d partial dep plots: I guess this is useful. We have something related already with the "conditioning" option to condition on certain values of a 2nd feature. For me it depends on how hard such a 3d plot would be to code in ggvis. If it is possible with reasonable time investment I would like to see it.
integrate dimensionality reduction techniques (e.g. PCA) into the partial dependencies stuff
Not sure I like this so much. Why do I do a part. dep. plot? Because I want to understand the effect a certain feature has on the output. An original feature. Together with a feature selection technique I can focus my attention on the most "relevant" features. If I do a PCA I have no idea what I in the end see in the plot. I would like to see an explained use case (or a lit ref where they discuss this) before we do this. I currently do not understand the benefit here, I guess. (Maybe I do see some benefit, but it is not really a part. dep. plot anymore. Would maybe be easier to discuss this orally. If so, I really do not see this as a natural extension of what is there but rather something else.)
regarding the se estimation:
For me this is really important. I would love it even more, if we would have something which can be generalized to other models as wells.
You can't do the 3D plot with ggvis, unless I majorly missed something. We'd have to use another package to do that, some of them use OpenGL and other sorts of things.
@berndbischl
yes that was my though initially, but as @larskotthoff pointed out with a biplot you can look at how the feature loaded onto a particular PC. i've seen this before with biplots. i don't know how well it would work for this, or if it would work for methods other than PCA though.
@berndbischl on the se estimation yes that would be nice. i have no ideas about a general method though.
how about this. find the nicest way to do a 3d plot for regression. so with only 2 features. then use this in plotLearnerPrediction. Then extend it to a 3d part dep plot. this seems very basic. and necessary.
(matlab has a nice 3d plot here. for imspiration maybe, i know this hint might not help much)
all agree?
Ok that is fair enough.
Sounds good.
I spent most of yesterday trying to make a spine plot in ggplot2 with no success. It seems that the only way to map a feature to bar width is with stat = "identity"
and position = "identity"
and I can't seem to figure out how to calculate the positions for the bars correctly (the is usually handled by one of the position_*
functions). If anyone knows of an obvious solution I am all ears. I have poked around a bit in the ggplot2 codebase but it seems like I will have to do a good bit of reading to understand how/if this is feasible.
library(mlr)
library(reshape2)
library(plyr)
iris = getTaskData(iris.task)
fit = train(makeLearner("classif.rpart", predict.type = "prob"), iris.task)
pd = generatePartialPredictionData(fit, getTaskData(iris.task), c("Petal.Width", "Petal.Length"))
obj = pd
data = obj$data
breaks = list()
for (x in obj$features) {
breaks[[x]] = hist(data[[x]], plot = FALSE)$breaks
data[[x]] = cut(data[[x]], breaks = breaks[[x]])
data$Width[!is.na(data[[x]])] = rep(as.integer(table(cut(iris[[x]], breaks = breaks[[x]]))),
times = as.integer(table(data[[x]])))
}
out = reshape2::melt(data, id.vars = c(obj$task.desc$class.levels, "Width"),
value.name = "Value", variable.name = "Feature")
out = reshape2::melt(out, id.vars = c("Feature", "Value", "Width"),
value.name = "Probability", variable.name = "Class")
out = unique(out[!is.na(out$Value), ])
plt = ggplot(out, aes(Value, fill = Class, width = Width, group = Class, weight = Probability)) +
facet_wrap(~ Feature, scales = "free_x")
plt + geom_bar(position = "fill", colour = "white")
## to check
plotPartialPrediction(pd)
Hmm, there seems to be a way to do it, but that requires some hackery and produces a warning.
I don't have a problem with this particular functionality using a different package though.
Ah cool thanks. Somehow I didn't find that one.
Yea there is the implementation in vcd. I am not convinced yet that this really makes it easier to see how features values affect class probabilities. In any case we have the generation function, so I am not sure how important it would be to import a new package just to write a wrapper for one function in it.
If it's easy to integrate in the framework you have in mind, I'd add it.
There is also an implementation spineplot
in graphics
, if we want to avoid importing vcd
.
I mentioned what we are doing in general in the news section of READMDE.md. Can somebody pls check if we all agree? (It should not be too long, it already kinda is)
I think you got all the important points and explained them well.
thx
Looks good -- I've reformatted to make it easier to read and changed some formulations.
I've had a nightmare of a time fixing the error I reported to Lars on the mlr-tutorial repo tracker. Still not fixed. I've done everything except for reformatting my computer now.
The error is
> r = resample(learner = lrn, task = sonar.task, resampling = rdesc, show.info = FALSE)
Error in resample(learner = lrn, task = sonar.task, resampling = rdesc, :
Assertion on 'xs' failed: Must be of type 'list', not 'NULL'
Which is generated of course many times during the tutorial and lots of other places. I am using the GitHub versions of all the packages specified in the travis config, have reinstalled things numerous times, have tried a binary version of R as well as the homebrew compiled version I was using before, etc. Any help on this would be appreciated. Very frustrated!
I don't use a mac, so I can't really help you there unfortunately. For the tutorial it's probably fine to rely on Travis, as it pushes the result anyway. Make sure to add your email address to the configuration so you get notified.
I think unfortunately this has crippled my ability to do much of anything. resample
fails everywhere with this.
In the near future (before May 5) I plan on refactoring
plotROCRCurves
to use plotROC which usesggplot2
instead of base graphics. This package offers some extra functionality (compared to what is available now) which I'll document. I also hope to get at least one other smallish feature done by then.One option would be extending
plotLearnerPrediction
to cases with 3 features. I think the two obvious things to do here are to use one of the 3D plotting packages (I think plot3Drgl is nice). Another thing I'd definitely like to do is to use facetting for the third feature. With a discrete feature this is easy but it might be nice to add the ability to discretize one of the features as well. We could also plot 4 features by using the background color as well. In general it would be possible to layer on additional features in this way but it seems to have diminishing returns in terms of interpretability after 2 or 3 features.Another thing I could possibly do is to add an option to
performance
that lets you apply a measure to a subset of the feature space. I find this very useful for exploring the fit of learners, especially with data that is structured in some way. I haven't looked at the code forperformance
yet so i don't have an idea how much work that would entail. One problem i can see is that if some of the cells of the grouping are small the variance might be quite large. I am not sure whether that is out of the scope of the project. Is this is something others would like to have?When I get back (around May 16-17) I would like to finish up any residual work from the above first. I'd like to talk to Julia/Lars/Bernd about what I do next. I've had my nose in the EDA related functionality lately and so my inclination is to start working on that first. Alternatively I could start work on producing interactive versions of the existing plotting functionality.
I have found some papers recently that I think are worth prioritizing above the minor things in my proposal (dependent data resampling methods and standard error estimates for random forests and other ensemble methods). In particular Hooker 2012 and Drummond and Holte 2006.