ledell commented 6 years ago

Hi there, Thanks for releasing this code. There are a few pieces of information missing that prevent the benchmarks from being reproducible.

Which version of H2O and auto-sklearn were used in this comparison?
What are the specs of the hardware (or AWS instance, etc) was used to run these benchmarks? Specifically, how many cores (assuming a single machine) and how much RAM?
If I were to run the MLJar benchmarks, would the free plan (plus some credits) suffice, computationally?
I don't see the script that shows the actual run of the benchmark (including dataset ids and seeds).

pplonski commented 6 years ago

Hi Erin!

h2o-3.17.0.4115 and auto-sklearn 0.2.1
AWS instances with 8 CPU and 15GB RAM (c4.2xlarge)
I was using plan with 12 parallel jobs (highest plan)
the script was running on seed values from 0 to 9 (10 repetition of train/validation split), on all datasets which are described in the table.

Are you able to reproduce results now? In case any questions, I'm happy to help.

ledell commented 6 years ago

Hi, Thanks for the info! I have not had a chance to run the benchmark code yet. Is there a reason that you used a nightly/dev version of H2O instead of a stable version? I am not sure it will change the results that much, but it's easier to document the benchmarks if you use a stable/released version.

pplonski commented 6 years ago

The only reason I used nightly build is to have to most recent version to not miss any feature/fix.

I was using 1 hour as a limit for model training time, but I'm very curious what will be results for 2 or 3 hours of model training.

don-lab-dc commented 6 years ago

I'd also love to see the results on stable releases over longer run times. (@ledell, have you done any work on this?) Also, for the time-limited version of this, it seems like you might want to use 8 parallel jobs rather than 12, given that your AWS instance only has 8 CPUs. Or, you could do something like select the best result from t-33% set of results. Also curious -- any reason you did not include TPOT?

ledell commented 6 years ago

@don-lab-dc I am working on a large benchmark right now with a few other open source AutoML package authors (right now includes auto-sklearn, H2O AutoML, TPOT, AutoWeka and a few others). We will publish our findings in the next 2 months or so.

felipeportella commented 5 years ago

@ledell, you already published your findings?

ledell commented 5 years ago

@felipeportella Not yet (we had to take a break from the work for a few months), but we are planning to submit a paper to a workshop this spring.

felipeportella commented 5 years ago

Thanks, @ledell ... let us know (in this thread) when published.

Just as an additional reference, on 17/08/2018 the guys from NCSU published this benchmarking: https://arxiv.org/abs/1808.06492v1

ledell commented 5 years ago

@felipeportella I've seen that benchmark -- the H2O benchmarks are all wrong. This paper (and how bad it was) was the motivation for our work.

ajoeajoe commented 5 years ago

I config API key and run use !python main.py -p h2o -d 24 -s 2, errors:

Exception: [Errno 2] File b'./data/24.csv' does not exist: b'./data/24.csv'

pplonski commented 5 years ago

Please run the following commands before:

mkdir data
python get_data.py

ajoeajoe commented 5 years ago

I have changed some codes with get_data.py file, for I use python3, as following:

dataset.get_data method seems has no this following:

return_categorical_indicator=True

return_attribute_names=True

codes like:


import os
import openml
import pandas as pd
import numpy as np

openml.config.apikey = 'mykey'

dataset_ids = [3, 24, 31, 38, 44, 179, 715, 718, 720, 722, 723, 727, 728, 734, 735, 737, 740, 741,
               819, 821, 822, 823, 833, 837, 843, 845, 846, 847]

for dataset_id in dataset_ids:
    print ('Get dataset id', dataset_id)
    dataset = openml.datasets.get_dataset(dataset_id)
    (X, y, categorical, names) = dataset.get_data(target=dataset.default_target_attribute)#, \
                                        #return_categorical_indicator=True, \
                                        #return_attribute_names=True)
    if len(np.unique(y)) != 2:
        print ('Not binary classification')
        continue
    vals = {}
    for i, name in enumerate(names):
        #vals[name] = X[:,i]
        vals[name] = X[name]
    vals['target'] = y
    df = pd.DataFrame(vals)
    df.to_csv('./data/{0}.csv'.format(dataset_id), index=False)

I have download all data file , but remain problems when I try "!python main.py -p auto-sklearn -d 3 -s 2"： I use python3，I don’t know why you first download as csv file ,it seems there is no need to do this job, can you try it in python3 before, it seems something happened in data format.

/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
Exception: could not convert string to float: 'f'

ajoeajoe commented 5 years ago

@ledell you already published your findings? why the H2O benchmarks in that paper are all wrong?

ledell commented 5 years ago

@ajoeajoe There were a number of issues in those benchmarks but the main one related to H2O was how they used Java memory inside of docker (they set the heap size to 100G on very small EC2 instances) which caused H2O to fail. They also prevented H2O from using all available cores (ncores was limited to 2, artificially crippling the performance).

Yes, we published our benchmarks at ICML this year: https://arxiv.org/abs/1907.00909

ajoeajoe commented 5 years ago

Thanks , good job, I think I should also use https://github.com/openml/automlbenchmark for experiment works.

mljar / automl_comparison

Missing information (not reproducible) #1

return_categorical_indicator=True

return_attribute_names=True