intel / dffml

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.
https://intel.github.io/dffml/main/
MIT License
247 stars 136 forks source link

model: automl: Implement AutoML #968

Open pdxjohnny opened 3 years ago

pdxjohnny commented 3 years ago

Project Description

AutoML or Automated Machine Learning as the name suggests automates the process of solving problems with Machine Learning. AutoML is generally helpful for people who aren't either familiar with Machine Learning or the involved programming. AutoML aims to improve the efficiency of any task involving Machine Learning.

The primary objective we are trying to achieve is to create a model that takes as a property of its config a set of models to used for hyperparameter tuning. Another property of its config is the set of models which we should attempt to tune (via the first set). Default values for these results in using all installed models to try to tune all installed model plugins.

Due to the shortened GSoC cycle, we may end up not doing all of these phases. Which one we go to will be decided as we approach the selection process.

Skills

Difficulty

Intermediate to Difficult

Related Readings

Getting Started

Potential Mentors

TejasKarkera10 commented 3 years ago

hey i want to contribute to this can you provide a potential start as in how should i proceed

rajpratyush commented 3 years ago

@pdxjohnny I have experience of python and have some working experience of Automl as well (did a guided project on coursera - Automl for computer vision) Furthermore I am currently a Kaggle contributor as well. can you provide some insights on how to begin working on this issue?

pdxjohnny commented 3 years ago

@TejasKarkera10 @rajpratyush Updated description

rajpratyush commented 3 years ago

@pdxjohnny we can begin this phase by some basic linear regression followed by ridge and lasso regression then move to other type of algorithems and i would generally go with supervised learning algorithms first

pdxjohnny commented 3 years ago

@rajpratyush How you choose to begin is up to you :)

Experimentation with approaches here would be wise. As it would create content which would inform ones proposal.

This issue will remain open for everyone to attempt until the GSoC proposal milestone.

rajpratyush commented 3 years ago

@pdxjohnny I would like to take it up as a project under GSoC 2021.

pdxjohnny commented 3 years ago

@rajpratyush Anyone may submit a proposal asking to work on this project during GSoC 2021.

See

rajpratyush commented 3 years ago

Ok sir @pdxjohnny btw i had asked some issues to work on can you assign them?

pdxjohnny commented 3 years ago

@rajpratyush Unfortunately the way GitHub org permissions work there is no way for me to use the issue assignment functionality to assign issues users who are not a member of the org a repo is in, or are not added as collaborators. I also do not have permissions to add users as collaborators. But you've read the contributing guidelines since your last comment so you know how we communicate so as to work around those issues.

rajpratyush commented 3 years ago

@pdxjohnny yes sir I have pushed few commits and there was needing guidance of 3 more issues so as to know where to begin with

pdxjohnny commented 2 years ago

It would be good to look at https://github.com/AutoViML/featurewiz when exploring this project. https://towardsdatascience.com/automate-your-feature-selection-workflow-in-one-line-of-python-code-3d4f23b7e2c4