Closed reuning closed 8 years ago
Good catch. I think I solved this in 4ce5429561289021b192835be79fcbab95d9a1c4 by just changing one line in .ivar_points
. Let me know if that solves the problem. Then I'll close this.
I think it still has a problem. Specifically line 197 rng <- as.data.frame(rng)
With different length lists the data.frame does not appreciate there being different length lists in rng. I am not sure what the simplest way is to fix this.
I fixed it an uploaded it to my fork. I havne't run the test though yet and see that it failed last time so it might have issues.
Also the plot_pd has issues now.
Two steps forward 5 backwards?
On Mon, Nov 23, 2015 at 9:22 PM, Zachary M. Jones notifications@github.com wrote:
Good catch. I think I solved this in 4ce5429 https://github.com/zmjones/edarf/commit/4ce5429561289021b192835be79fcbab95d9a1c4 by just changing one line in .ivar_points. Let me know if that solves the problem. Then I'll close this.
— Reply to this email directly or view it on GitHub https://github.com/zmjones/edarf/pull/44#issuecomment-159132833.
Ah OK. I will look at this some more tonight.
On Tue, Nov 24, 2015, 12:10 PM Kevin Reuning notifications@github.com wrote:
I think it still has a problem. Specifically line 197 rng <- as.data.frame(rng)
With different length lists the data.frame does not appreciate there being different length lists in rng. I am not sure what the simplest way is to fix this.
I fixed it an uploaded it to my fork. I havne't run the test though yet and see that it failed last time so it might have issues.
Also the plot_pd has issues now.
Two steps forward 5 backwards?
On Mon, Nov 23, 2015 at 9:22 PM, Zachary M. Jones < notifications@github.com> wrote:
Good catch. I think I solved this in 4ce5429 < https://github.com/zmjones/edarf/commit/4ce5429561289021b192835be79fcbab95d9a1c4
by just changing one line in .ivar_points. Let me know if that solves the problem. Then I'll close this.
— Reply to this email directly or view it on GitHub https://github.com/zmjones/edarf/pull/44#issuecomment-159132833.
— Reply to this email directly or view it on GitHub https://github.com/zmjones/edarf/pull/44#issuecomment-159342781.
I made another simple change. Just drop duplicate "observations" in the prediction grid. It worked with the simple example I have (look below). As you noted the plot is broken when interaction = FALSE
. I think this should work though. It doesn't seem unreasonable to me to request bivariate partial dependence for a set of features of mixed type. I think unfortunately this will require me to just coerce the factor to an integer and (also unfortunately) ggplot2 won't allow me to disable lines being drawn for unordered factors, but I could at least generate a warning for this case.
n = 100
x = sample(1:10, n, TRUE)
z = as.factor(sample(letters[1:2], n, TRUE))
y = rowSums(model.matrix(~ x + z + x * z)) + rnorm(n)
library(randomForest)
fit = randomForest(y ~ x + z)
pd = partial_dependence(fit, data.frame(x, z, y), c("x", "z"), interaction = TRUE, cutoff = 5)
plot_pd(pd)
ping!
I am waiting until after I have my paper drafted to get back to this. I need to finish it up and would rather not mess with things until after that :P
On Tue, Dec 1, 2015 at 4:11 PM, Zachary M. Jones notifications@github.com wrote:
ping!
— Reply to this email directly or view it on GitHub https://github.com/zmjones/edarf/pull/44#issuecomment-161096906.
pishaw
this is fixed
This checks for factor variables, and adjust the cutoffs so it doesn't create duplicate predictions, especially useful for when checking interactions. Example: In the previous version if you were going to generated PD for a continuous variable from Z=1:5 and a factor X=(Yes, No) and if the cutoff was 5, it would create 5^2 variables. (as it didn't realize it would do 5 levels for X even though there were only 2 proper levels). This version will automatically only create 10 predictions in such an instance: Z=1:5 when X=Yes, and Z=1:5 when X=No.
It isn't exactly pretty. But it does work with the cases I tried.