vignette about asymptotic complexity of line search

tdhock commented 1 year ago

after going through a few iterations of the first for loop in https://github.com/tdhock/aum/blob/main/vignettes/line-search.Rmd I executed this code

  N.seq <- as.integer(10^seq(2,log10(max(diff.list$subtrain$example)),l=10))
  N.seq <- as.integer(10^seq(log10(100),log10(1600),l=10))
  atime.list <- atime::atime(
    N=N.seq,
    setup={
      maxIterations <- N*(N-1)/2
      X.subtrain <- X.keep[index.list$subtrain,]
      X.sub <- X.subtrain[1:N,]
      diff.sub <- diff.list$subtrain[example %in% seq(0,N-1)]
    },
    result=TRUE,
    seconds.limit=1,
    times=1,
    aum_line_search={
      nb.weight.search <- aum::aum_line_search(
        diff.sub,
        maxIterations=maxIterations,
        feature.mat=X.sub,
        weight.vec=weight.vec)
      list(
        total.iterations=nrow(nb.weight.search$line_search_result),
        iterations.to.min=nb.weight.search$line_search_result[,which.min(aum)])
    })
  atime.list$measurements[, total.iterations := sapply(
    result, function(L)L$total.iterations)]
  atime.list$measurements[, iterations.to.min := sapply(
    result, function(L)L$iterations.to.min)]

  best.list <- atime::references_best(atime.list, unit.col.vec=c("total.iterations","iterations.to.min"))
best.ref <- best.list$ref[each.sign.rank==1]
  library(ggplot2)
  gg <- ggplot()+
    theme_bw()+
    facet_grid(unit ~ ., scales="free")+
    geom_line(aes(
      N, empirical, color=expr.name),
      data=best.list$meas)+
    scale_x_log10()+
    scale_y_log10("median line, min/max band")
  gg.show <- gg+
     directlabels::geom_dl(aes(
       N, empirical, color=expr.name, label=expr.name),
       method="right.polygons",
       data=best.list$meas)+
     theme(legend.position="none")+
     coord_cartesian(xlim=c(min(N.seq),max(best.list$meas$N)*5))
  gg.ref <- gg.show+
    geom_line(aes(
      N, reference, group=paste(fun.name, expr.name)),
      color="grey",
      data=best.ref)
    gg.ref+
      directlabels::geom_dl(aes(
        N, reference,
        label.group=paste(fun.name, expr.name),
        label=fun.name),
        data=best.ref,
        color="grey",
        method="left.polygons")

and I got this plot which suggests that the number of iterations in line search is quadratic, and so is the number of iterations to get to the min. @phase Would be nice to have a vignette that explores this more systematically,

do this asymptotic analysis for every step of gradient descent on full data?
do gradient descent on different data sizes, and keep track of these metrics at each step?

tdhock commented 1 year ago

I did an analysis of all the neuroblastoma-data using the new code in #6 (keep doing more line search iterations until AUM increases) and I observed that the number of iterations that takes is quadratic in the number of input breakpoints/lines. So probably too slow for a vignette on CRAN, closing. source: https://github.com/tdhock/max-generalized-auc/blob/master/figure-line-search-complexity.R

tdhock commented 1 year ago

previous plot was "keep doing more iterations of line search while subtrain aum is decreasing." what would the plot look like if we did validation aum instead of subtrain?

tdhock commented 1 year ago

data for "keep doing more iterations of line search while subtrain aum is decreasing" here https://github.com/tdhock/max-generalized-auc/blob/master/figure-line-search-complexity.csv can we add the total number of iterations of approx/constant line search? (rather than keep going line search) it should be linear (smaller slope)

tdhock commented 1 year ago

actually, even the approx line search (exactL, linear number of iterations of exact line search algorithm) does a quadratic number of iterations, same as min.aum (keep doing more iterations while subtrain aum is decreasing), see below: To explain the result above, we can examine the number of steps of gradient descent, which is larger for exactL and smaller for exactQ (quadratic number of iterations, full exact line search algorithm), and smaller for min.aum, see below: The overall timings (including overhead of R memory allocation etc) are shown below, and suggest that the aum.min method is slightly faster, but all three methods are about the same, source code: https://github.com/tdhock/max-generalized-auc/blob/9574892ed8204771cef360d06756a5aacecd5e99/figure-line-search-complexity-compare.R Also max validation AUC is about the same between methods, see below, There is a slight increase of AUC for min.aum/exactQ over exactL.

tdhock commented 1 year ago

On this data set, init=zero gets larger valid AUC than init=IRCV. And for IRCV we see that maxIterations=min.aum is consistently better than grid search.

phase commented 1 year ago

Here are some graphs from my tests. I think I've reproduced exactL taking a large amount of steps of gradient descent. This makes me wonder if doing the full quadratic amount of iterations and then checking a few grid points would improve hybrid.

tdhock commented 1 year ago

THanks for sharing, those results look consistent. "This makes me wonder if doing the full quadratic amount of iterations and then checking a few grid points would improve hybrid." -> checking a few grid points would not help quadratic because the quadratic already checks all possible step sizes.

phase commented 1 year ago

oh yes - not sure what I was thinking!

I ran some more tests with different hybrid variants and got similar results

tdhock / aum

vignette about asymptotic complexity of line search #5