h2oai / h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform
http://h2o.ai
1.48k stars 1.01k forks source link

Adding short tutorials showing how to use the new h2o.sklearn module #123

Closed sebhrusen closed 4 years ago

sebhrusen commented 4 years ago

the 2 Jupyter notebooks can be viewed under https://github.com/h2oai/h2o-tutorials/tree/sklearn-support/tutorials/sklearn-integration

There is a generic one explaining in details how the new h2o.sklearn wrappers can be used in combination with Scikit-learn components, and summing-up how this works.

the second one is more specifically dedicated to AutoML, adding some examples.

hannah-tillman commented 4 years ago

@sebhrusen Just an FYI: when I'm running the first example ("mixing sklearn with h2o.sklearn components") for the "H2O-3 integration with Scikit-Learn" file, I get this error when running line "In [2]" when I am using the Python 3 kernel. This error does not occur when I run the code and switch to the Python 2 kernel.

...
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error
-> 1319                 raise URLError(err)
   1320             r = h.getresponse()
   1321         except:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
sebhrusen commented 4 years ago

@hannah-tillman thanks for the review, I will fix the mistakes. regarding the URLError, I was using Python 3 and didn't get the error. As it's using iris though, I'll remove the dependency to our dataset stored on s3 and use the iris dataset provided by sklearn instead: it's just too bad that the latter is already encoded, I also wanted to show that contrary to sklearn, with H2O we can still use non-encoded datasets...

UPDATE: decided to specify pandas requirement instead as it seems you may be using an old version of pandas in Python3 (probably < 0.19.2).

sebhrusen commented 4 years ago

@hannah-tillman , @ledell are we ok with merging this?