kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 443 forks source link

[GSoC] Project 4: Hyperparameter Optimization API in Katib for LLMs #2339

Open helenxie-bit opened 5 months ago

helenxie-bit commented 5 months ago

Motivation

The rapid advancements and growing popularity of Large Language Models (LLMs) have driven an increased need for effective LLMOps in Kubernetes environments. To address this, we developed a train API within the Training Python SDK, simplifying the process of fine-tuning LLMs using distributed PyTorchJob workers. However, hyperparameter optimization remains a crucial yet labor-intensive task for enhancing model performance.

Goal

This project aims to develop a high-level API for tuning hyperparameters of LLMs that automates the process of hyperparameter optimization in Kubernetes.

By leveraging the capabilities of Katib and Training Operator, this API allows users to define custom objective function or import pretained models and datasets from external platforms like HuggingFace and Amazon S3, as well as specify objective metrics, optimization algorithm, optimization goal, resources configuration, etc, then this API will automate the creation and execution of Experiment and Trials to find out best hyperparameters. This abstraction of Kubernetes infrastructure complexities will enable data scientists to optimize hyperparameters efficiently and effectively.

design_tune_api_20240906

What I Did in GSoC Project & Ongoing Works

  1. Prepare

  2. Development

  3. Wrapping up code and documentation

    • [ ] Create documentation for the API, including usage instructions, code examples, etc
  4. Other PRs

What I Learned from This Project

This is my first experience with open source, and as a beginner with Docker and Kubernetes, I gained significant knowledge throughout this project. Beyond understanding containers, Kubernetes, API development, and CI/CD pipelines, I’ve learned valuable lessons that will benefit my future studies and work:

Think from the User's Perspective: One key lesson was the importance of considering the user’s needs. Discussing API design with my mentors taught me to focus on what functionalities users need and how they prefer to use them. Listening to users’ feedback is crucial for effective product design.

Don't Fear Bugs: I used to be flustered by bugs and unsure how to address them. My mentor guided me through the debugging process, showing me how to understand and trace bugs. The key is to approach debugging methodically and think through the problem.

Communication is Important: Communication is important in collaboration, especially in open source projects. There are various ways of communicating in open-source projects, such as GitHub issues or PRs, Slack, and community meetings. And I’m grateful to my mentor for discussing my challenges during weekly meeting and providing invaluable guidance.

Every Contribution Counts: Initially, I thought contributing to open source was complex. I learned that every contribution, no matter how small, is valuable and appreciated. For example, contributing to documentation is crucial, especially for newcomers.

In The End

Thank you to Google for this invaluable opportunity. I’m deeply grateful to everyone who supported me throughout this project @andreyvelich @johnugeorge @deepanker13 @tenzen-y @nsingl00 @Electronic-Waste . Your suggestions, advice, and help were essential to completing my work.

And I want to say huge thanks to my mentor @andreyvelich . I'm impressed by your deep knowledge of the project and the industry, and your willingness to help. Your encouragement during our first meeting, sharing that you also found Kubernetes challenging at first, gave me great confidence. I appreciate the time and effort you invested in guiding me through this project, from the overall design of the API to the details of code formatting. I’ve learned a lot from your guidance.

I believe that anyone contributing to open source in their spare time has a passion for coding, and I’m glad to have worked with such a dedicated group. I will continue contributing and hope to support other beginners in the future.

andreyvelich commented 5 months ago

/area gsoc

helenxie-bit commented 5 months ago

/assign