h2o learner runs out of memory during prolonged tuning

missuse commented 5 years ago

I have recently started tuning h2o learners using mlr.

After prolonged tuning even with quite small data sets (1000 rows and 400 columns) I receive out of memory errors from h2o. I am able to prolong the number of tuning iterations if I initiate a h2o instance with fewer threads and relatively big min_mem_size:

h2o.init(min_mem_size = "8G", nthreads = 1)

However this prolongs the training time and still can cause out of memory errors if I call resample on a TuneWrapper.

example:

base_lrn <- list(
  makeLearner("classif.h2o.deeplearning",
              id = "h20_2",
              predict.type = "prob"),
  makeLearner("classif.h2o.deeplearning",
              id = "h20_3",
              predict.type = "prob"))

mm_lrn <- makeModelMultiplexer(base_lrn)

par_set <- makeParamSet(
  makeDiscreteParam("selected.learner", values = extractSubList(base_lrn, "id")),
  makeDiscreteParam("h20_2.hidden", values = list(a = c(32L, 32L),
                                                  b = c(64L, 64L),
                                                  c = c(128L, 128L))),
  makeDiscreteParam("h20_3.hidden", values = list(a = c(16L, 16L, 16L),
                                                  b = c(32L, 32L, 32L),
                                                  c = c(64L, 64L, 64L),
                                                  d = c(128L, 128L, 128L))),
  makeDiscreteParam("h20_2.activation", values = "RectifierWithDropout", tunable = FALSE),
  makeDiscreteParam("h20_3.activation", values = "RectifierWithDropout", tunable = FALSE),
  makeNumericParam("h20_2.input_dropout_ratio", lower = 0, upper = 0.4, default = 0.1),
  makeNumericParam("h20_3.input_dropout_ratio", lower = 0, upper = 0.4, default = 0.1),
  makeNumericVectorParam("h20_2.hidden_dropout_ratios", len = 2, lower = 0, upper = 0.6, default = rep(0.3, 2),
                         requires = quote(selected.learner == "h20_2")),
  makeNumericVectorParam("h20_3.hidden_dropout_ratios", len = 3, lower = 0, upper = 0.6, default = rep(0.3, 3),
                         requires = quote(selected.learner == "h20_3")))

ctrl <- makeTuneControlIrace(budget = 500L,
                             n.instances = 200L)

tw <- makeTuneWrapper(mm_lrn,
                      resampling = cv3,
                      control = ctrl,
                      par.set = par_set,
                      show.info = TRUE,
                      measures = list(auc,
                                      bac))

perf_tw <- resample(tw, 
                    task = spam.task,
                    resampling = cv5,
                    extract = getTuneResult,
                    models = TRUE,
                    show.info = TRUE,
                    measures = list(auc,
                                    bac))

The error occurs after a couple of hours. When I reduce the number of re-sampling instances in ctrl the tuning is able to finish.

I have 16 G of RAM on this machine and while I understand that getting more RAM would probably allow me to perform what I desire I was hopping the mlr team would consider adding a piece of code to the h2o learners that would clean the h2o instance after each train/predict iteration.

These questions on SO seem related: https://stackoverflow.com/questions/44281612/r-h2o-connection-memory-issue https://stackoverflow.com/questions/28703121/r-h2o-memory-management?noredirect=1&lq=1

Thank you.

larskotthoff commented 5 years ago

This is an h2o problem that we can do little about here. It sounds like the only way to avoid this completely would be to periodically restart the h2o server or explicitly manage memory, which would require custom code and hooks implemented on our side to work around a limitation of h2o. This would likely be significant effort and we have no plans to implement anything like this.

Sorry for the negative answer.

missuse commented 5 years ago

Thank you for the fast reply larskotthoff. I understand.

pat-s commented 5 years ago

I've never had the need to use h2o so far in my work. Is there anything you would get from h2o instead of using mlr directly? (Just asking out of curiosity.)

missuse commented 5 years ago

I really like using mlr. Not only that, I have everything already setup to use mlr the way I wish to. However when using h2o learners via mlr I simply can not perform the evaluation to match other learners I am testing due to the mentioned memory problems. I think the fix might not be that hard to implement. I will attempt to make a custom learner. If I am successful I will report back.

If these memory problems were not there I would say mlr extends h2o quite nicely. For instance I am able to use Irace with no effort at all. However when I use h2o directly I am able to overcome the memory problems just by cleaning the h2o instance after each training iteration or by shutting down the h2o instance and initiating a new one.

I think the mlr team should implement keras since all the integrated deep learning learners are more or less inferior to it.

pat-s commented 5 years ago

Why question was more on the side "what does h2o give you that mlr does not" -> so why do you need the extra layer? If it is a learner, than fair enough. Otherwise I usually see people struggling with h2o and haven't seen its advantage yet.

We won't add any new features/learners from our side to mlr anymore, all development goes into mlr3.

missuse commented 5 years ago

You are correct about the struggle. I intended to use h2o since it seemed it had the most advanced deepnet implementation from the mlr integrated learners.

pat-s commented 5 years ago

You can also have a look at mlr3. We have not yet integrated a deep learning learner there and some things are unstable but this will be the future of machine learning in R (at least how we see it). Right now we're in a transitioning period so everything is a bit complicated.

mlr-org / mlr

h2o learner runs out of memory during prolonged tuning #2633