Optuna Integration for automated Hyper parameter search

Anindyadeep commented 1 year ago

Description

This PR targets to add a new feature of automated hyperparameter search using Optuna. Additionally, it also introduces a new spec for doing a hyperparameter search using three ways.

[x] Hyperparameter search using Optuna
[ ] Hyperparameter search using GPT API using the TaskType and complex prompting
[ ] Hybrid search using GPT and Optuna

This PR solves issue #313

This is how the train_model() function changes from the client's side

from prompt2model.model_trainer import GenerationModelTrainer
from pathlib import Path

trainer = GenerationModelTrainer(
    pre_train_model_name,
    has_encoder=True,
    executor_batch_size=8,
    tokenizer_max_length=1024,
    sequence_max_length=1280,
    device="CPU"
)

args_output_root = Path("result/training_output")
args_output_root.mkdir(parents=True, exist_ok=True)

trained_model, trained_tokenizer = trainer.train_model(
    training_datasets=train_datasets,
    validation_datasets=val_datasets,
    hyperparameter_search_mode="optuna"
)

Some additional changes like supporting default hyperparameters as an option is also provided. However that is something needs to be discussed upon.

neubig commented 1 year ago

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.
Just to clarify, would you like us to start reviewing this now, or is it still WIP?
It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.

Anindyadeep commented 1 year ago

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.

Just to clarify, would you like us to start reviewing this now, or is it still WIP?

It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.

Yes, I intend to split that into three different PRs, I just have mentioned it here, will remove that Todo eventually.
I also mentioned @viswavi (as he is been mentoring me in this) to start a quick review and would appreciate it if you also provide a high-level review of this. (However the work is still in progress, I have to test the working of the hyperparameter search since, my device does not have that much GPU)
As I have mentioned, currently I have a gap in making changes and testing those, so I just wanted to add a PR to get your views as well as run the branch inside colab. We can make this a draft PR for now. However, please do provide your thoughts on the changes I am making eventially.

neubig commented 1 year ago

Sounds great, I'll take a look when I have a chance.

viswavi commented 1 year ago

Hi @Anindyadeep, I made a quick pass through this and generally it looks very good. Thank you for the quick work! I've left 2 minor comments in the PR. After addressing those, can you potentially clean up the code a little bit?

from the repo root directory, run pre-commit run --all-files and also run pytest to make sure this change has not broken any other tests.

After doing this, I will make an in-depth review of the PR.

viswavi commented 1 year ago

Also, it looks like there may be merge conflicts with neulab:main

viswavi commented 1 year ago

Comment I made to Anindyadeep over DM (copying here for visibility):

""" I think that this pattern makes sense, but the way I was originally thinking of this was a little different; have a ParamSelector class that wraps the model trainer (rather than being embedded in the model trainer) so you would pass the model trainer into the ParamSelector class, and then the parameter selector will run this trainer on a bunch of different configurations before ultimately returning a single trained model

I feel that this provides a little more modularity, but I'm open to changing my mind if you can convince me that the other pattern is better 🙂 """

Anindyadeep commented 1 year ago

Yeah, @viswavi make sense, will look into that and push another PR with that and pre-recommit all done

Anindyadeep commented 1 year ago

Hey @viswavi, added some more changes from our previous discussion. Currently checks inside precommit is passing. However, for certain reasons and changes, tests are breaking. We might need to do some discussion on this and I can re iterate on the commits.

zhaochenyang20 commented 1 year ago

@Anindyadeep Thanks so much for your contribution. I am pondering how you can control the training device. I searched against the whole Trainer, but I found that self.device is never used? 🤔

I hope that I was wrong, and I guess we can assign the training device, #317 is fixed.

neubig commented 1 year ago

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

Anindyadeep commented 1 year ago

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

Yes @neubig, I am working on this right now. However I am blocked for some cases, hence paused the work. But I am going to roll out the first iterations soon.

neubig commented 1 year ago

OK, great! Please tell us if there's anything we can do to help.

Anindyadeep commented 1 year ago

OK, great! Please tell us if there's anything we can do to help.

So right now the problem is:

Optuna seems to have some of internal problems with GPU memory management, and hence I wasn't able to run the code properly.
For this, I was requiring the assistance of @viswavi. However, I and he both got busy with some commitments.

What could be the possible solutions

Research and see where the bottlenecks are and see if that can be solved. (Status: I am blocked in this)
Try to see if there are any alternatives, then we can swap in that with optuna. Here I was trying to explore ray-tune, a library by AnyScale. So they seem to have a pretty optimized one. However, I am still doing some trials on that. If things work out, then I was thinking of replacing optuna with ray-tune.

Update:

Here are the complete list of issues I am facing and documented

RayTune by AnyScale is have't updated their docs and their code base. Here is the code base: https://github.com/ray-project/ray/issues/39763
Optuna: https://discuss.huggingface.co/t/unusal-pattern-of-cuda-out-of-error-when-using-hyperparameter-search-optuna-backend/54619
SigOpt: It is paid platform.

Anindyadeep commented 1 year ago

Hi @neubig, sorry was busy earlier for some time. Here is the latest PR I pushed. Added a test for now. May need to discuss with @viswavi to know what more tests need to be added. But I also added the integration for hyperparam selector in CLI. Things are working fine there. Let me know what more needs to be added.

Thanks

Anindyadeep commented 11 months ago

@viswavi I had to remove the select_from_base function from the base otherwise it is failing the tests.

Anindyadeep commented 11 months ago

Thank you so much @viswavi and professor @neubig for mentoring throughout the project. I learned lot on this process, specifically on perfection and structured approaches. Looking forward to make some more PRs on this amazing project.

Next one I would like to go for the CLI issue and try to make it better :)

Thanks once again.

neulab / prompt2model

Optuna Integration for automated Hyper parameter search #315

Description