Open dvirginz opened 2 years ago
Didnt think about it. Why do you have several runs? Why you cant run it in one run? Do you have different features in each run? How many runs do you have? Sorry for so many questions but it is quite unexpected usage for me.
Thank you very much for the fast response:) There are many usecases. Before naming a few, let me point to pycaret, h2o, tpot and others that support logging frameworks, which might point to the fact many people are facing such problems. Yet, and as a last disclaimer, after using all of the above, I find the pipeline with mljar easy to work with and straightforward, and I like it:)
During research (also in production environment with real data), we find ourselves do many manipulations and tweaks to the data in an on-going process. It maybe the case where today I thought of a good new feature, and tomorrow I decided to discard it.
Having a "single source of truth" where I can see all models and runs that were optimized on a specific task (I.e regress_future_salary) is needed in all my past use-cases.
Hope that makes sense. Thanks!
@pplonski would you be willing to accept such a contribution?
@adrienpacifico yes, which framework you would like to support?
MLFlow mostly.
@adrienpacifico do you think you are able to provide general implementation that can be further extended for other services?
I do not know, I will probably have time in the next months, and I might try to tackle some open source issues. But I do not know how able I am to do it.
@adrienpacifico ok, got it. You can try to start with some very simple minimal feature that will allow to log model final score or score during training.
Hi! Do you have any plans in supporting logging frameworks as mlflow? Currently? trying to find the best model from multiple different runs almost impossible:)
The ability to easily find and filter by hyper-parameters and valuation metrics can be great.
Thanks!