Open tdhock opened 1 year ago
I did an analysis of all the neuroblastoma-data using the new code in #6 (keep doing more line search iterations until AUM increases) and I observed that the number of iterations that takes is quadratic in the number of input breakpoints/lines. So probably too slow for a vignette on CRAN, closing. source: https://github.com/tdhock/max-generalized-auc/blob/master/figure-line-search-complexity.R
previous plot was "keep doing more iterations of line search while subtrain aum is decreasing." what would the plot look like if we did validation aum instead of subtrain?
data for "keep doing more iterations of line search while subtrain aum is decreasing" here https://github.com/tdhock/max-generalized-auc/blob/master/figure-line-search-complexity.csv can we add the total number of iterations of approx/constant line search? (rather than keep going line search) it should be linear (smaller slope)
actually, even the approx line search (exactL, linear number of iterations of exact line search algorithm) does a quadratic number of iterations, same as min.aum (keep doing more iterations while subtrain aum is decreasing), see below: To explain the result above, we can examine the number of steps of gradient descent, which is larger for exactL and smaller for exactQ (quadratic number of iterations, full exact line search algorithm), and smaller for min.aum, see below: The overall timings (including overhead of R memory allocation etc) are shown below, and suggest that the aum.min method is slightly faster, but all three methods are about the same, source code: https://github.com/tdhock/max-generalized-auc/blob/9574892ed8204771cef360d06756a5aacecd5e99/figure-line-search-complexity-compare.R Also max validation AUC is about the same between methods, see below, There is a slight increase of AUC for min.aum/exactQ over exactL.
On this data set, init=zero gets larger valid AUC than init=IRCV. And for IRCV we see that maxIterations=min.aum is consistently better than grid search.
Here are some graphs from my tests. I think I've reproduced exactL
taking a large amount of steps of gradient descent.
This makes me wonder if doing the full quadratic amount of iterations and then checking a few grid points would improve hybrid
.
THanks for sharing, those results look consistent. "This makes me wonder if doing the full quadratic amount of iterations and then checking a few grid points would improve hybrid." -> checking a few grid points would not help quadratic because the quadratic already checks all possible step sizes.
oh yes - not sure what I was thinking!
I ran some more tests with different hybrid variants and got similar results
after going through a few iterations of the first for loop in https://github.com/tdhock/aum/blob/main/vignettes/line-search.Rmd I executed this code
and I got this plot which suggests that the number of iterations in line search is quadratic, and so is the number of iterations to get to the min. @phase Would be nice to have a vignette that explores this more systematically,