ryantibs / quantgen

Tools for generalized quantile modeling
https://ryantibs.github.io/quantgen
14 stars 9 forks source link

Investigate fit quality (and speed) vs. quantreg on particular data set #12

Open brookslogan opened 2 years ago

brookslogan commented 2 years ago

On my machine, quantgen with gurobi appears to produce fits with notably worse in-sample quantile loss on the highest and lowest quantile levels in the attached data set, with a 34% increase in loss on one and a 43% increase in loss in the other, relative to quantreg, with the following common settings:

Additionally, quantreg was much faster to produce results in this configuration, which, if I recall correctly, was not the case in the vignette example. quantgen with glpk was running the slowest, so I did not include it in the loss comparison above.

debug-quantgen-data-deparsed.txt

library("pipeR")

## data = readRDS("debug-quantgen-data.RDS")
source("debug-quantgen-data-deparsed.txt")

print("LICENSE & ATTRIBUTION:")
print(data[["LICENSE"]])

quantgen.gurobi.fit = quantgen::quantile_lasso(data[["x"]], data[["y"]], data[["taus"]], lambda=0, intercept=TRUE, standardize=FALSE, noncross=FALSE, lp_solver="gurobi", verbose=TRUE)
## quantgen.glpk.fit = quantgen::quantile_lasso(data[["x"]], data[["y"]], data[["taus"]], lambda=0, intercept=TRUE, standardize=FALSE, noncross=FALSE, lp_solver="glpk", verbose=TRUE)

stopifnot(all(quantgen.gurobi.fit[["status"]] == "OPTIMAL"))

x.with.intercept = cbind("(Intercept)"=rep(1, nrow(data[["x"]])), data[["x"]])
quantreg.fits = lapply(data[["taus"]], function(tau) {
  quantreg::rq.fit(x.with.intercept, data[["y"]], tau)
})

quantgen.gurobi.yhats.via.predict = unname(predict(quantgen.gurobi.fit, data[["x"]]))
quantgen.gurobi.yhats.via.multiplication = unname(as.matrix(x.with.intercept %*% quantgen.gurobi.fit[["beta"]]))

quantreg.yhats.via.multiplication = lapply(quantreg.fits, function(quantreg.fit, tau) {
  x.with.intercept %*% as.matrix(quantreg.fit[["coefficients"]])
}) %>>%
  {do.call(cbind, .)}

stopifnot(all.equal(quantgen.gurobi.yhats.via.predict, quantgen.gurobi.yhats.via.multiplication))
all.equal(quantgen.gurobi.yhats.via.predict, quantreg.yhats.via.multiplication, check.attributes=FALSE)
## [1] "Mean relative difference: 0.05030153"

pmax0 = function(x) {
  `[<-`(x, x<0, 0)
}

quantile_losses = function(residuals, taus) {
  if (dim(residuals)[[2L]] != length(taus)) {
    stop ('dimension mismatch')
  }
  t.residuals = t(residuals)
  rowMeans(taus*pmax0(t.residuals) + (1-taus)*pmax0(-t.residuals))
}

quantgen.gurobi.losses = quantile_losses(data[["y"]] - quantgen.gurobi.yhats.via.multiplication, data[["taus"]])
quantreg.losses = quantile_losses(data[["y"]] - quantreg.yhats.via.multiplication, data[["taus"]])

100 * (quantgen.gurobi.losses - quantreg.losses) / quantreg.losses
## [1] 34.466310673  0.091520742  0.043563195  0.168333151  0.284732772
## [6]  0.500149404  0.447703928  0.125777531  0.029489879  0.913990222
## [11]  0.112511111  0.122463906  0.057082111  0.244793884  0.030920394
## [16]  0.129422499  0.053487628  0.004508362  0.103481194  0.029261431
## [21]  0.247275783  0.160596869 42.578259363