h2oai / h2o4gpu

H2Oai GPU Edition
Apache License 2.0
459 stars 94 forks source link

Add util to prep data for h2o4gpu python package #514

Closed navdeep-G closed 6 years ago

ledell commented 6 years ago

@navdeep-G What kind of data prep? I assume you mean things like label encoding, one-hot encoding & imputation? Can we just expose the Scikit-learn methods for this, or do we need to write new methods from scratch?

navdeep-G commented 6 years ago

I was thinking of utilizing py datatable somehow.

ledell commented 6 years ago

@navdeep-G It would be ideal to use py datatable on the backend, while keeping the scikit-learn API. Here's a list of all the scikit-learn preprocessing methods that we could use (and fall back to scikit-learn when GPU not supported?): http://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing

navdeep-G commented 6 years ago

Yes, but both operations are CPU. So no need to fallback.

ledell commented 6 years ago

@navdeep-G I thought py datatable was GPU capable?

navdeep-G commented 6 years ago

@ledell py datatable is pure CPU as of today.

navdeep-G commented 6 years ago

Researching more and I think sklearn's capabilities achieve what is needed and most users of Python machine learning libraries will know how to use these methods.