neulab / prompt2model

prompt2model - Generate Deployable Models from Natural Language Instructions
Apache License 2.0
1.95k stars 173 forks source link

Optuna Integration for automated Hyper parameter search #315

Closed Anindyadeep closed 11 months ago

Anindyadeep commented 1 year ago

Description

This PR targets to add a new feature of automated hyperparameter search using Optuna. Additionally, it also introduces a new spec for doing a hyperparameter search using three ways.

This PR solves issue #313

This is how the train_model() function changes from the client's side

from prompt2model.model_trainer import GenerationModelTrainer
from pathlib import Path

trainer = GenerationModelTrainer(
    pre_train_model_name,
    has_encoder=True,
    executor_batch_size=8,
    tokenizer_max_length=1024,
    sequence_max_length=1280,
    device="CPU"
)

args_output_root = Path("result/training_output")
args_output_root.mkdir(parents=True, exist_ok=True)

trained_model, trained_tokenizer = trainer.train_model(
    training_datasets=train_datasets,
    validation_datasets=val_datasets,
    hyperparameter_search_mode="optuna"
)

Some additional changes like supporting default hyperparameters as an option is also provided. However that is something needs to be discussed upon.

neubig commented 1 year ago

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

  1. You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.
  2. Just to clarify, would you like us to start reviewing this now, or is it still WIP?
  3. It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.
Anindyadeep commented 1 year ago

Wow, thanks for the contribution @Anindyadeep!

First, a few initial comments:

  1. You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.
  2. Just to clarify, would you like us to start reviewing this now, or is it still WIP?
  3. It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.
neubig commented 1 year ago

Sounds great, I'll take a look when I have a chance.

viswavi commented 1 year ago

Hi @Anindyadeep, I made a quick pass through this and generally it looks very good. Thank you for the quick work! I've left 2 minor comments in the PR. After addressing those, can you potentially clean up the code a little bit?

from the repo root directory, run pre-commit run --all-files and also run pytest to make sure this change has not broken any other tests.

After doing this, I will make an in-depth review of the PR.

viswavi commented 1 year ago

Also, it looks like there may be merge conflicts with neulab:main

viswavi commented 1 year ago

Comment I made to Anindyadeep over DM (copying here for visibility):

""" I think that this pattern makes sense, but the way I was originally thinking of this was a little different; have a ParamSelector class that wraps the model trainer (rather than being embedded in the model trainer) so you would pass the model trainer into the ParamSelector class, and then the parameter selector will run this trainer on a bunch of different configurations before ultimately returning a single trained model

I feel that this provides a little more modularity, but I'm open to changing my mind if you can convince me that the other pattern is better 🙂 """

Anindyadeep commented 1 year ago

Yeah, @viswavi make sense, will look into that and push another PR with that and pre-recommit all done

Anindyadeep commented 1 year ago

Hey @viswavi, added some more changes from our previous discussion. Currently checks inside precommit is passing. However, for certain reasons and changes, tests are breaking. We might need to do some discussion on this and I can re iterate on the commits.

zhaochenyang20 commented 1 year ago

@Anindyadeep Thanks so much for your contribution. I am pondering how you can control the training device. I searched against the whole Trainer, but I found that self.device is never used? 🤔

I hope that I was wrong, and I guess we can assign the training device, #317 is fixed.

neubig commented 1 year ago

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

Anindyadeep commented 1 year ago

Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?

Yes @neubig, I am working on this right now. However I am blocked for some cases, hence paused the work. But I am going to roll out the first iterations soon.

neubig commented 1 year ago

OK, great! Please tell us if there's anything we can do to help.

Anindyadeep commented 1 year ago

OK, great! Please tell us if there's anything we can do to help.

So right now the problem is:

  1. Optuna seems to have some of internal problems with GPU memory management, and hence I wasn't able to run the code properly.
  2. For this, I was requiring the assistance of @viswavi. However, I and he both got busy with some commitments.

What could be the possible solutions

  1. Research and see where the bottlenecks are and see if that can be solved. (Status: I am blocked in this)
  2. Try to see if there are any alternatives, then we can swap in that with optuna. Here I was trying to explore ray-tune, a library by AnyScale. So they seem to have a pretty optimized one. However, I am still doing some trials on that. If things work out, then I was thinking of replacing optuna with ray-tune.

Update:

Here are the complete list of issues I am facing and documented

  1. RayTune by AnyScale is have't updated their docs and their code base. Here is the code base: https://github.com/ray-project/ray/issues/39763

  2. Optuna: https://discuss.huggingface.co/t/unusal-pattern-of-cuda-out-of-error-when-using-hyperparameter-search-optuna-backend/54619

  3. SigOpt: It is paid platform.

Anindyadeep commented 1 year ago

Hi @neubig, sorry was busy earlier for some time. Here is the latest PR I pushed. Added a test for now. May need to discuss with @viswavi to know what more tests need to be added. But I also added the integration for hyperparam selector in CLI. Things are working fine there. Let me know what more needs to be added.

Thanks

Anindyadeep commented 11 months ago

@viswavi I had to remove the select_from_base function from the base otherwise it is failing the tests.

Anindyadeep commented 11 months ago

Thank you so much @viswavi and professor @neubig for mentoring throughout the project. I learned lot on this process, specifically on perfection and structured approaches. Looking forward to make some more PRs on this amazing project.

Next one I would like to go for the CLI issue and try to make it better :)

Thanks once again.