Closed exalate-issue-sync[bot] closed 1 year ago
Arno Candel commented: Step 2: add weights_column argument to h2o.quantile(). Step 3: Add test for weighted quantiles.
Arno Candel commented: Question for wtd.quantile in R as used in https://github.com/h2oai/h2o-3/blob/master/h2o-r/tests/testdir_misc/runit_NOPASS_weighted_quantile.R
Does wtd.quantile get the same results for
{code} fr1 = parse_test_file("smalldata/junit/weights_all_twos.csv"); fr2 = parse_test_file("smalldata/junit/weights_all_ones.csv"); {code}
and for
{code} fr1 = parse_test_file("smalldata/junit/no_weights.csv"); fr2 = parse_test_file("smalldata/junit/weights.csv"); {code}
I am referring to https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/test/java/hex/quantile/QuantileTest.java#L218 and https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/test/java/hex/quantile/QuantileTest.java#L264, which are both failing currently. Trying to understand whether R gets the exact same quantiles for those cases or not.
Arno Candel commented: {code} setwd("~") nw <- read.csv("h2o-3/smalldata/junit/no_weights.csv") w <- read.csv("h2o-3/smalldata/junit/weights.csv") w1 <- read.csv("h2o-3/smalldata/junit/weights_all_ones.csv") w2 <- read.csv("h2o-3/smalldata/junit/weights_all_twos.csv") quantile(nw$f1, probs = seq(0,1,.05)) wtd.quantile(w$f1, w$weight, probs = seq(0,1,.05)) wtd.quantile(w$f1, w$weight, probs = seq(0,1,.05), normwt = T) wtd.quantile(w1$f1, w1$weight, probs = seq(0,1,.05)) wtd.quantile(w2$f1, w2$weight, probs = seq(0,1,.05)) wtd.quantile(w2$f1, w2$weight, probs = seq(0,1,.05), normwt = T) {code}
This shows that even wtd.quantile doesn't behave the same way for all w=1 vs all w=2.
Arno Candel commented: h3. simple test for weights < 1, not quite right as of dbef536dc48fa
{code} probs <- c(0,0.25,0.5,0.75,1)
x <- c(1,2,3,4,5) w <- c(0.5,0.4,0.3,0.2,0.1) wtd.quantile(x,weights=w,probs=probs) wtd.quantile(x,weights=w,normwt = T,probs=probs) wtd.quantile(x, 5:1,probs=probs) y <- c(1,1,1,1,1,2,2,2,2,3,3,3,4,4,5) quantile(y,probs=probs)
library(h2o) h2o.init() x <- c(1,2,3,4,5) w <- c(0.5,0.4,0.3,0.2,0.1) df <- as.h2o(x) df$w <- as.h2o(w) h2o.quantile(df,weights_column="w",probs=probs)
df$w <- df$w*nrow(df)/sum(df$w) h2o.quantile(df,weights_column="w",probs=probs)
df <- as.h2o(x) df$w <- as.h2o(5:1) h2o.quantile(df,weights_column="w",probs=probs)
y <- c(1,1,1,1,1,2,2,2,2,3,3,3,4,4,5) df <- as.h2o(y) h2o.quantile(df, probs=probs) {code}
JIRA Issue Migration Info
Jira Issue: PUBDEV-2402 Assignee: Arno Candel Reporter: Arno Candel State: Resolved Fix Version: N/A Attachments: N/A Development PRs: N/A
Arno Candel commented: Step 1: Confirm that quantiles are correct in comparison to R:
{code} library(testthat) library(h2o) h2o.init() df <- h2o.createFrame(missing_fraction = 0, seed=1234) df for (i in c(1,4,5,6,7,9,10)) { h<-h2o.quantile(df[,i], probs=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0)) r<-quantile(x=as.matrix(as.data.frame(df[,i])), probs=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0)) expect_equal(r,h) } {code}
Passes.