Prediction time is taking longer than expected

mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

https://mljar.com

MIT License

3k stars 401 forks source link

Prediction time is taking longer than expected #312

Closed salomonMuriel closed 3 years ago

salomonMuriel commented 3 years ago

Hi! I've banged my head against the wall for a couple of days and can't solve this. Prediction times are taking much longer than is to be expected. Running an AutoML model for a regression task takes upwards of 3 seconds for a single prediction.

I believe this is because the predict method for the AutoML class loads every model that was saved every time you ask for a prediction. It would be much more optimal to load all models on call init, and call their prediction methods without having to load them every time.

pplonski commented 3 years ago

Thank you @salomonMuriel for reporting. Have you used Stacking? I will dig into it.

RafaD5 commented 3 years ago

I think that if you add the attribute self._loaded as False in the BaseAutoML class and that if you change the following lines in the method _base _predict()

#old
self._check_is_fitted()
#new
if not self._loaded:
   self._check_is_fitted()
   self._loaded = True

the prediction time decreases because the models are only loaded the first time. I used .predict() two times for the same X and the results are the following

If you want I can put in a PR with the fix

pplonski commented 3 years ago

Thank you, @RafaD5 good catch! Please give me some time to check if that's the only issue and I will let you know if ready to PR.

Some more questions:

Is 1 second a prediction time for one sample? Is it acceptable or still too slow?
How many models are in the AutoML?
Do you have feature engineering features enabled (k-means, golden features)?
Can you post a minimal code example to reproduce this problem?

RafaD5 commented 3 years ago

I found another fix that can help decrease the first prediction time. I changed the load() method of the BaseAutoML class so that it just loads the neccesary models to predict, not all the trained models:

It's still too slow. The time doesn't seem to increase a lot with more samples.
The submodels that make up the ensemble are:
I trained with mode='Perform' and total_time_limit=60*60*16. The train dataset shape is (88643, 23)

@decorators.timer
def predict(model, X):
    return model.predict(X)   

automl = AutoML(
    results_path=str(model_dir)
)
print('Cold prediction')
y = predict(automl, X_test.head(1))
print('')
print('Number of sample:', 1)
y = predict(automl, X_test.head(1))
for i in range(5, 200, 20):
    print('Number of samples:', i)
    y = predict(automl, X_test.head(i))

pplonski commented 3 years ago

Thank you @RafaD5, I've found one more thing that slows down the prediction time. I'm always loading out-of-folds predictions when loading models. It is needed when the training is interrupted to continue training (oofs are needed for ensemble and stacking). There will 3 steps to speed-up predictions:

load only needed models
load only once
don't load OOF predictions

I'm working on this right now, so the fix should be today/tomorrow.

pplonski commented 3 years ago

@RafaD5 @salomonMuriel I've pushed fixes to the dev branch, to install the package with the newest changes please run:

pip install -U git+https://github.com/mljar/mljar-supervised.git@dev

I checked the solution by running few examples and it was working fine.

What I did:

I only load models one time in the predict()
I added lazy loading, so only models needed for computing predictions are loaded from the hard drive - they are loaded during predict(). For other models, only parameters are loaded (from *.json files)
I removed loading out-of-folds predictions

The changes are not backward compatible so there is a need to retrain models from scratch.

salomonMuriel commented 3 years ago

That's great @pplonski ! Much appreciated. Our work with @RafaD5 and I believe that of many users of the package in production environments is very much affected by response times so every msec counts.

Another possible change would be to pre-load the models needed for prediction on the class init from a pre-trained set of moldes (so basically when a folder path is provided), so that they would be already loaded when .predict() is called. Maybe it could be added as an optional boolean parameter?

pplonski commented 3 years ago

@salomonMuriel I agree, the parameter to load models can be a way to go, I need to think about how this can be effectively implemented. I will let you know at the beginning of the next week.

pplonski commented 3 years ago

@salomonMuriel I thought about this and it will be implemented in the following way:

if the training was interrupted, and AutoML is re-loaded, then lazy loading will be applied
for AutoML successfully trained, there will be a full load of the best model

pplonski commented 3 years ago

All code updates are in the dev branch. The changes:

removed best_model.txt file, all information about the best model is stored in the params.json file
information about which models should be loaded for computing prediction is stored in the params.json file with the key load_on_predict. Models are loaded during AutoML initialization. Only models needed to run the best model are loaded. If the training is not finished, then all models are loaded, but with lazy loading.

The changes will go to the 0.9.0 release.

salomonMuriel commented 3 years ago

Amazing @pplonski !! Thanks so much, this is extremely helpful.