Closed Anindyadeep closed 11 months ago
Wow, thanks for the contribution @Anindyadeep!
First, a few initial comments:
Wow, thanks for the contribution @Anindyadeep!
First, a few initial comments:
- You listed three different things that could be done for hyperparameter search. I would definitely suggest that we split those into three separate PRs (for ease of reviewing). So we can just review the optuna hyperparameter search in this PR.
- Just to clarify, would you like us to start reviewing this now, or is it still WIP?
- It seems that this is not passing formatting checks. I would suggest that you run pre-commit checks, as detailed here.
Yes, I intend to split that into three different PRs, I just have mentioned it here, will remove that Todo eventually.
I also mentioned @viswavi (as he is been mentoring me in this) to start a quick review and would appreciate it if you also provide a high-level review of this. (However the work is still in progress, I have to test the working of the hyperparameter search since, my device does not have that much GPU)
As I have mentioned, currently I have a gap in making changes and testing those, so I just wanted to add a PR to get your views as well as run the branch inside colab. We can make this a draft PR for now. However, please do provide your thoughts on the changes I am making eventially.
Sounds great, I'll take a look when I have a chance.
Hi @Anindyadeep, I made a quick pass through this and generally it looks very good. Thank you for the quick work! I've left 2 minor comments in the PR. After addressing those, can you potentially clean up the code a little bit?
from the repo root directory, run pre-commit run --all-files
and also run pytest
to make sure this change has not broken any other tests.
After doing this, I will make an in-depth review of the PR.
Also, it looks like there may be merge conflicts with neulab:main
Comment I made to Anindyadeep over DM (copying here for visibility):
""" I think that this pattern makes sense, but the way I was originally thinking of this was a little different; have a ParamSelector class that wraps the model trainer (rather than being embedded in the model trainer) so you would pass the model trainer into the ParamSelector class, and then the parameter selector will run this trainer on a bunch of different configurations before ultimately returning a single trained model
I feel that this provides a little more modularity, but I'm open to changing my mind if you can convince me that the other pattern is better 🙂 """
Yeah, @viswavi make sense, will look into that and push another PR with that and pre-recommit all done
Hey @viswavi, added some more changes from our previous discussion. Currently checks inside precommit is passing. However, for certain reasons and changes, tests are breaking. We might need to do some discussion on this and I can re iterate on the commits.
@Anindyadeep Thanks so much for your contribution. I am pondering how you can control the training device. I searched against the whole Trainer, but I found that self.device
is never used? 🤔
I hope that I was wrong, and I guess we can assign the training device, #317 is fixed.
Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?
Hi @viswavi and @Anindyadeep , thanks a lot for working on this! I was wondering if we were still working on this?
Yes @neubig, I am working on this right now. However I am blocked for some cases, hence paused the work. But I am going to roll out the first iterations soon.
OK, great! Please tell us if there's anything we can do to help.
OK, great! Please tell us if there's anything we can do to help.
So right now the problem is:
What could be the possible solutions
ray-tune
, a library by AnyScale. So they seem to have a pretty optimized one. However, I am still doing some trials on that. If things work out, then I was thinking of replacing optuna with ray-tune
.Update:
Here are the complete list of issues I am facing and documented
RayTune by AnyScale is have't updated their docs and their code base. Here is the code base: https://github.com/ray-project/ray/issues/39763
SigOpt: It is paid platform.
Hi @neubig, sorry was busy earlier for some time. Here is the latest PR I pushed. Added a test for now. May need to discuss with @viswavi to know what more tests need to be added. But I also added the integration for hyperparam selector in CLI. Things are working fine there. Let me know what more needs to be added.
Thanks
@viswavi I had to remove the select_from_base
function from the base otherwise it is failing the tests.
Thank you so much @viswavi and professor @neubig for mentoring throughout the project. I learned lot on this process, specifically on perfection and structured approaches. Looking forward to make some more PRs on this amazing project.
Next one I would like to go for the CLI issue and try to make it better :)
Thanks once again.
Description
This PR targets to add a new feature of automated hyperparameter search using Optuna. Additionally, it also introduces a new spec for doing a hyperparameter search using three ways.
This PR solves issue #313
This is how the
train_model()
function changes from the client's sideSome additional changes like supporting default hyperparameters as an option is also provided. However that is something needs to be discussed upon.