Have you considered GPU training for say the boosts and also for neural net?

mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

https://mljar.com

MIT License

3.04k stars 406 forks source link

Have you considered GPU training for say the boosts and also for neural net? #316

Open Data-drone opened 3 years ago

Data-drone commented 3 years ago

Seems like it might be one way to get more performance for larger problems

pplonski commented 3 years ago

Hi @Data-drone! Sure, that will be nice to have but for the future. Right now, I'm focused on making AutoML work nicely with good results on the CPU. Most tabular datasets that I've seen are small < 4GB. Working on AutoML and at the same time fighting with constantly changing drivers for GPU might be a nightmare.

For Neural Networks at first, I was using Keras+TF. But I have some problems with them:

TensorFlow has many dependencies that were in conflict with other packages
the package size for Keras+TF was large - so installation was long
the results for Keras+TF on tabular datasets were not super, they were rather average ...

That's why I switched to MLP from scikit-learn.

Anyway, in the future, it will be nice to have support for GPU.

mglowacki100 commented 3 years ago

I think you can consider fast.ai -> tabular NN with GPU. As far I remember for xgboost GPU is about a parameter or two in configuration. Probably similar for catboost and lightGBM.

pplonski commented 3 years ago

@mglowacki100 have you tested fast.ai tabular NN? do they have good performance?

I don't like the fact that I need to transform pandas data frame to their own table format before training. Look's like a huge framework, so I'm a little skeptical.

Yes, switch GBM algorithms to GPU should be straightforward, but there additional constraints that need to be implemented. For example, not all parameters/metrics work with GPU. Would you like to take a look at it?

mglowacki100 commented 3 years ago

@pplonski I'll ve some time in the next month for gbm algos. I think the easiest and most flexible approach is just to clone e.g. xgboost as separate algo with gpu-adjusted params and let's call it Xgboost_GPU in algorithms parameters.

Data-drone commented 3 years ago

Yeah there is https://github.com/rapidsai/cuml for some of the other algorithms as well

huanvo88 commented 3 years ago

I have problem with Tensorflow on my server as well (seems like it has to do with GPU + CUDA version). However Pytorch installation is a breeze. Maybe we can use Pytorch instead of Tensorflow? Plus the transformers from the hugging face library are written in Pytorch anyway.

Karlheinzniebuhr commented 1 year ago

@pplonski any updates on this? Catboost supports GPU nicely out of the box so I think at least some of the training process could greatly be speed up in Mljar through the GPU

yanivc-jfrog commented 1 week ago

Any updates?