mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 404 forks source link

gsoc-visualization #289

Closed zmjones closed 9 years ago

zmjones commented 9 years ago

In the near future (before May 5) I plan on refactoring plotROCRCurves to use plotROC which uses ggplot2 instead of base graphics. This package offers some extra functionality (compared to what is available now) which I'll document. I also hope to get at least one other smallish feature done by then.

One option would be extending plotLearnerPrediction to cases with 3 features. I think the two obvious things to do here are to use one of the 3D plotting packages (I think plot3Drgl is nice). Another thing I'd definitely like to do is to use facetting for the third feature. With a discrete feature this is easy but it might be nice to add the ability to discretize one of the features as well. We could also plot 4 features by using the background color as well. In general it would be possible to layer on additional features in this way but it seems to have diminishing returns in terms of interpretability after 2 or 3 features.

Another thing I could possibly do is to add an option to performance that lets you apply a measure to a subset of the feature space. I find this very useful for exploring the fit of learners, especially with data that is structured in some way. I haven't looked at the code for performance yet so i don't have an idea how much work that would entail. One problem i can see is that if some of the cells of the grouping are small the variance might be quite large. I am not sure whether that is out of the scope of the project. Is this is something others would like to have?

When I get back (around May 16-17) I would like to finish up any residual work from the above first. I'd like to talk to Julia/Lars/Bernd about what I do next. I've had my nose in the EDA related functionality lately and so my inclination is to start working on that first. Alternatively I could start work on producing interactive versions of the existing plotting functionality.

I have found some papers recently that I think are worth prioritizing above the minor things in my proposal (dependent data resampling methods and standard error estimates for random forests and other ensemble methods). In particular Hooker 2012 and Drummond and Holte 2006.

berndbischl commented 9 years ago

@larskotthoff You cannot make the bounds tighter. They either are, or they are not.

My main point is: Do we understand what these bounds say? Otherwise we should not implement them. Because we need to explain them to the user.

berndbischl commented 9 years ago

I am on the fence here. Whether I wanna see this or not.....

larskotthoff commented 9 years ago

If you change the quantiles, the distance to the median will change. What the plot Zach posted seems to show is that the particular feature isn't very informative wrt the prediction. With the bounds, you can see that the variation within the bounds (i.e. what other features account for) is much larger than the variation caused by the feature itself.

berndbischl commented 9 years ago

If you change the quantiles, the distance to the median will change.

Yes, I know that. But you simple plot a different thing then. The bounds change in the plot. But the distributional spread is the same....

Like I said, I still might wanna see this. I guess if you can see how large the variation across all Ys is, and if this is reduced by conditioning on X = 3, this might be informative?

larskotthoff commented 9 years ago

Not sure what you're saying -- that sounds like a different kind of analysis? The advantage of having the bounds in the plot I can see is that it gives you an idea of the "in-feature" variation vs. "out-feature" variation.

zmjones commented 9 years ago

I think you both agree. I don't think it is hard to implement this, though I haven't done survival outcomes yet. maybe the default should just be that we only use a location measure?

larskotthoff commented 9 years ago

That sounds reasonable.

zmjones commented 9 years ago

Ok this is done and seems to work well. I have a few questions. Are the only two possibilities for predict.type "response" and "prob"?

For survival tasks. Is it the case that all of them generate a response column as a result of predict, and that this is numeric? Is it always the hazard? If not I'd need a general way of figuring out what the output is.

I can't remember what we decided to do about classification w/o probabilities and clustering. I have passed as the aggregation function for classification w/o probabilities table(x) / length(x) which creates plots which are just aggregated versions of the probability plots. I guess this way then has no advantages over the probability plots then (in the case with many classes which gives cluttered plots).

zmjones commented 9 years ago

the gist is updated with my current code

schiffner commented 9 years ago

Thanks. I'm having a look now.

Are the only two possibilities for predict.type "response" and "prob"?

There is also "se" for regression learners (which just gives standard deviations additional to the response).

About survival outputs: I'm pretty sure it's always numeric and some kind of risk. @mllg: Could you please clarify?

berndbischl commented 9 years ago

can't remember what we decided to do about classification w/o probabilities and clustering. I

For the partial dep plots? We decided not to do it now / focus on it. If the learner does not give you probs, just throw an exception.

schiffner commented 9 years ago

Are the plots intended to work for factor features (like chas in bh.task)?

berndbischl commented 9 years ago

Are the plots intended to work for factor features (like chas in bh.task)?

Partial dependency? Would be nice.

zmjones commented 9 years ago

@schiffner yes. my implementation probably doesn't work for that now but it will.

zmjones commented 9 years ago

@berndbischl is what I have in the gist (the second example) not worth it at all?

berndbischl commented 9 years ago

is what I have in the gist (the second example) not worth it at all?

Just saying we decided. But we also said nobody is holding you back to try it out. I looked at your gist. IMHO it looks good and could be informative, so I dont see the need to remove it now :)

zmjones commented 9 years ago

It now works with factor features, survival outcomes, and multiple features. I added a facetting argument to the plot which works very well when the second feature is a factor and is passable when it is not. I am curious what people think about the facetting solution for plotting a 2nd feature dimension.

My original plan was to not have a separate plotPartialPrediction function but instead just plug the output of generatePartialPredictionData into plotLearnerPrediction, which could be added to. Should I instead continue to develop plotPartialPrediction?

larskotthoff commented 9 years ago

Thanks, that looks great! For the facetting, I would prefer to keep the axis ranges the same throughout to make the differences clearer (see in particular the last example).

Regarding your question, what do you think makes more sense? I would try to keep it simple and not have several functions take the output of the data generation function unless there's a specific reason to do that.

zmjones commented 9 years ago

Good point, fixed.

So I think the plotting tasks are very similar if not the same. In both cases you have 1-2, maybe 3 feature dimensions; only you've gotten there by different means. Separating them only makes sense if they have really different purposes. Maybe they do. plotLearnerPrediction seems to me only useful from a pedagogical perspective now.

I did think I could just give the output from generatePartialPrediction the prediction class plus some extra structure but that doesn't make sense bc there is no "truth" that we have in this case.

berndbischl commented 9 years ago

plotLearnerPrediction seems to me only useful from a pedagogical perspective now.

You mean to quickly try out a method and learn how its decision boundary looks like? Actually only for that purpose it was invented, so I could show it to students in class quickly :)

zmjones commented 9 years ago

Yea that is what I meant.

Maybe we could just rename the current plotLearnerPrediction to plotExampleLearnerPrediction or similar, and take some of the plotting code from that and make a new plotLearnerPrediction. But idk if this is a good idea, just a proposal. If we have separate plotting functions for everything we will definitely end up with some duplication, but maybe we'd avoid a monster.

larskotthoff commented 9 years ago

Well, do you envisage what you're implementing to eventually provide the same functionality? If so, I'd just leave the implementation alone for now and eventually drop it.

zmjones commented 9 years ago

Yes but we'd want the name to be different to use it for anything other than output from generatePartialPredictionData. I'll just keep working on it as is then.

zmjones commented 9 years ago

the past few days i've been working on some suggestions bernd made prior to his presentation to some psychologists. i think i've finished them except for dropping the negative class (or should it be the less frequent class?) from a partial plot with binary classification, which i will do today probably.

i've also been reviewing bernd's student's pr (bernd gave me commit rights just for this, i'll still be issuing PRs in the normal fashion). i thought he was done and started cleaning it, but apparently he is still working on some things.

the rest of this week i'm going to try to finish the following

when I get done with that i could start on a couple of things (also your suggestions)

of course i'd like to do all of these by the end of the summer

berndbischl commented 9 years ago

im except for dropping the negative class (or should it be the less frequent class?

You should display wrt to the class marked "positive" in the task IMHO. As mlr in general in probabilities refers to that, if you only want one value and so on.

berndbischl commented 9 years ago

when I get done with that i could start on a couple of things

I love what you list. and it also reflects my order of importance.

larskotthoff commented 9 years ago

Sounds good. I remember that we talked about a few other things as well, e.g. refactoring plotLearnerPredictions() and having 3D plots to allow to show more partial dependencies. I think it would also be great to integrate dimensionality reduction techniques (e.g. PCA) into the partial dependencies stuff.

What you've done so far is great, and I think it would be good to continue along those lines for a bit to get something even more comprehensive. The other things you've mentioned would be very useful as well, but are quite different from what you've worked on so far.

Oh and I'd add writing something for R bloggers or similar to the list, probably after finishing up with the viz stuff but before moving on to something else.

zmjones commented 9 years ago

@berndbischl ok that sounds good on the positive class

@larskotthoff yes i definitely like the dimensionality reduction stuff. we need to talk more about it for me to get 100% what the plan is on that. i like that idea better than 3d plots, which i find sort of unwieldily.

so it sounds like the se estimation stuff and the website rewrite should go to the bottom.

for the mlr/caret comparison then i can put it on my website (and list the authors) and put it on R-Bloggers that way (I already have RSS) unless there is a better alternative.

berndbischl commented 9 years ago

if we start to disagree on the list we should talk on hangout some time.

larskotthoff commented 9 years ago

Sounds like that would be a good idea anyway. I'm pretty much free today and the rest of the week.

berndbischl commented 9 years ago

I have some time problems currently to be honest. I could talk thursday maybe. But I have to say I really like Zach's list in the post above.

berndbischl commented 9 years ago

I will add a few more comments to what you want above Lars:

3d partial dep plots: I guess this is useful. We have something related already with the "conditioning" option to condition on certain values of a 2nd feature. For me it depends on how hard such a 3d plot would be to code in ggvis. If it is possible with reasonable time investment I would like to see it.

integrate dimensionality reduction techniques (e.g. PCA) into the partial dependencies stuff

Not sure I like this so much. Why do I do a part. dep. plot? Because I want to understand the effect a certain feature has on the output. An original feature. Together with a feature selection technique I can focus my attention on the most "relevant" features. If I do a PCA I have no idea what I in the end see in the plot. I would like to see an explained use case (or a lit ref where they discuss this) before we do this. I currently do not understand the benefit here, I guess. (Maybe I do see some benefit, but it is not really a part. dep. plot anymore. Would maybe be easier to discuss this orally. If so, I really do not see this as a natural extension of what is there but rather something else.)

berndbischl commented 9 years ago

regarding the se estimation:

For me this is really important. I would love it even more, if we would have something which can be generalized to other models as wells.

zmjones commented 9 years ago

You can't do the 3D plot with ggvis, unless I majorly missed something. We'd have to use another package to do that, some of them use OpenGL and other sorts of things.

@berndbischl

yes that was my though initially, but as @larskotthoff pointed out with a biplot you can look at how the feature loaded onto a particular PC. i've seen this before with biplots. i don't know how well it would work for this, or if it would work for methods other than PCA though.

zmjones commented 9 years ago

@berndbischl on the se estimation yes that would be nice. i have no ideas about a general method though.

berndbischl commented 9 years ago

how about this. find the nicest way to do a 3d plot for regression. so with only 2 features. then use this in plotLearnerPrediction. Then extend it to a 3d part dep plot. this seems very basic. and necessary.

(matlab has a nice 3d plot here. for imspiration maybe, i know this hint might not help much)

all agree?

zmjones commented 9 years ago

Ok that is fair enough.

larskotthoff commented 9 years ago

Sounds good.

zmjones commented 9 years ago

I spent most of yesterday trying to make a spine plot in ggplot2 with no success. It seems that the only way to map a feature to bar width is with stat = "identity" and position = "identity" and I can't seem to figure out how to calculate the positions for the bars correctly (the is usually handled by one of the position_* functions). If anyone knows of an obvious solution I am all ears. I have poked around a bit in the ggplot2 codebase but it seems like I will have to do a good bit of reading to understand how/if this is feasible.

library(mlr)
library(reshape2)
library(plyr)

iris = getTaskData(iris.task)
fit = train(makeLearner("classif.rpart", predict.type = "prob"), iris.task)
pd = generatePartialPredictionData(fit, getTaskData(iris.task), c("Petal.Width", "Petal.Length"))

obj = pd
data = obj$data
breaks = list()

for (x in obj$features) {
  breaks[[x]] = hist(data[[x]], plot = FALSE)$breaks
  data[[x]] = cut(data[[x]], breaks = breaks[[x]])
  data$Width[!is.na(data[[x]])] = rep(as.integer(table(cut(iris[[x]], breaks = breaks[[x]]))),
                                      times = as.integer(table(data[[x]])))
}

out = reshape2::melt(data, id.vars = c(obj$task.desc$class.levels, "Width"),
                     value.name = "Value", variable.name = "Feature")
out = reshape2::melt(out, id.vars = c("Feature", "Value", "Width"),
                     value.name = "Probability", variable.name = "Class")
out = unique(out[!is.na(out$Value), ])

plt = ggplot(out, aes(Value, fill = Class, width = Width, group = Class, weight = Probability)) +
  facet_wrap(~ Feature, scales = "free_x")
plt + geom_bar(position = "fill", colour = "white")

## to check
plotPartialPrediction(pd)
larskotthoff commented 9 years ago

Hmm, there seems to be a way to do it, but that requires some hackery and produces a warning.

I don't have a problem with this particular functionality using a different package though.

zmjones commented 9 years ago

Ah cool thanks. Somehow I didn't find that one.

Yea there is the implementation in vcd. I am not convinced yet that this really makes it easier to see how features values affect class probabilities. In any case we have the generation function, so I am not sure how important it would be to import a new package just to write a wrapper for one function in it.

larskotthoff commented 9 years ago

If it's easy to integrate in the framework you have in mind, I'd add it.

schiffner commented 9 years ago

There is also an implementation spineplot in graphics, if we want to avoid importing vcd.

berndbischl commented 9 years ago

I mentioned what we are doing in general in the news section of READMDE.md. Can somebody pls check if we all agree? (It should not be too long, it already kinda is)

schiffner commented 9 years ago

I think you got all the important points and explained them well.

berndbischl commented 9 years ago

thx

larskotthoff commented 9 years ago

Looks good -- I've reformatted to make it easier to read and changed some formulations.

zmjones commented 9 years ago

I've had a nightmare of a time fixing the error I reported to Lars on the mlr-tutorial repo tracker. Still not fixed. I've done everything except for reformatting my computer now.

The error is

> r = resample(learner = lrn, task = sonar.task, resampling = rdesc, show.info = FALSE)
Error in resample(learner = lrn, task = sonar.task, resampling = rdesc,  : 
  Assertion on 'xs' failed: Must be of type 'list', not 'NULL'

Which is generated of course many times during the tutorial and lots of other places. I am using the GitHub versions of all the packages specified in the travis config, have reinstalled things numerous times, have tried a binary version of R as well as the homebrew compiled version I was using before, etc. Any help on this would be appreciated. Very frustrated!

larskotthoff commented 9 years ago

I don't use a mac, so I can't really help you there unfortunately. For the tutorial it's probably fine to rely on Travis, as it pushes the result anyway. Make sure to add your email address to the configuration so you get notified.

zmjones commented 9 years ago

I think unfortunately this has crippled my ability to do much of anything. resample fails everywhere with this.