Supporting logging frameworks as mlflow, neptuneai or wandb

mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

https://mljar.com

MIT License

3.05k stars 407 forks source link

Supporting logging frameworks as mlflow, neptuneai or wandb #545

Open dvirginz opened 2 years ago

dvirginz commented 2 years ago

Hi! Do you have any plans in supporting logging frameworks as mlflow? Currently? trying to find the best model from multiple different runs almost impossible:)

The ability to easily find and filter by hyper-parameters and valuation metrics can be great.

Thanks!

pplonski commented 2 years ago

Didnt think about it. Why do you have several runs? Why you cant run it in one run? Do you have different features in each run? How many runs do you have? Sorry for so many questions but it is quite unexpected usage for me.

dvirginz commented 2 years ago

Thank you very much for the fast response:) There are many usecases. Before naming a few, let me point to pycaret, h2o, tpot and others that support logging frameworks, which might point to the fact many people are facing such problems. Yet, and as a last disclaimer, after using all of the above, I find the pipeline with mljar easy to work with and straightforward, and I like it:)

During research (also in production environment with real data), we find ourselves do many manipulations and tweaks to the data in an on-going process. It maybe the case where today I thought of a good new feature, and tomorrow I decided to discard it.

Having a "single source of truth" where I can see all models and runs that were optimized on a specific task (I.e regress_future_salary) is needed in all my past use-cases.

Hope that makes sense. Thanks!

adrienpacifico commented 2 years ago

@pplonski would you be willing to accept such a contribution?

pplonski commented 2 years ago

@adrienpacifico yes, which framework you would like to support?

adrienpacifico commented 2 years ago

MLFlow mostly.

pplonski commented 2 years ago

@adrienpacifico do you think you are able to provide general implementation that can be further extended for other services?

adrienpacifico commented 2 years ago

I do not know, I will probably have time in the next months, and I might try to tackle some open source issues. But I do not know how able I am to do it.

pplonski commented 2 years ago

@adrienpacifico ok, got it. You can try to start with some very simple minimal feature that will allow to log model final score or score during training.