openml-labs / gama

An automated machine learning tool aimed to facilitate AutoML research.
https://openml-labs.github.io/gama/master/
Apache License 2.0
93 stars 30 forks source link

OpenML integration #32

Open PGijsbers opened 5 years ago

PGijsbers commented 5 years ago

Automatically run GAMA on OpenML tasks, by adding an optional dependency on openml api. Specifics need to be decided on, e.g.:

simonprovost commented 10 months ago

As a side note to your "feature"'s idea, I have no doubts you know Mihaela from her Van Der Schaar Lab. Mihaela, very recently, shared a post on LinkedIn about a new LLM developed in their lab. This LLM is designed for AutoML Health projects, allowing users (in their case scenario, health practitioners) to apply AutoML to their data without writing a main.py script. The model intelligently sets up all necessary parameters based on the user's request (Data, Metric, etc.) and much more !

Given the extensive resources available @ OpenML, including a variety of datasets and metrics, considering a similar methodology is an intriguing proposition. Using a LLM in conjunction with GAMA as the primary AutoML engine could be a significant advancement. The crux of this approach would be to see if this system could effectively generate a main.py script tailored to a user's specific needs. This would entail integrating user-supplied data, which could come from OpenML's own datasets or from elsewhere, as well as preferred metrics and other critical parameters. Integration of these elements with the capabilities of the LLM and GAMA may not only streamline the process but also result in significant improvements in project management by any users of OpenML ressources. This concept, I believe, holds great promise for improving the utility and efficiency of OpenML-related tools!

Here's the link to Mihaela's post for more information: https://www.linkedin.com/posts/mihaela-van-der-schaar_healthcareinnovation-aiinhealthcare-largelanguagemodels-activity-7136291751480160256-L11p/

This might not be directly relevant given that this is old of 2019 ! but I thought to share this, to show this is already done somewhere else and could be useful for your work @ OpenML. Hope this is helpful!

Cheers,

PGijsbers commented 10 months ago

One of the main benefits of the issue as I intended it here, was to also be able to upload information about the internal optimization GAMA performs to OpenML. As such, this discussion is largely unrelated to the topic.

That said, for the kind of system you proposed I would prefer a separate package, I think. It would be much easier to manage and doesn't necessarily need a tight integration with GAMA's code-base (it just needs to understand the public interface).