kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 443 forks source link

[GSOC] `hyperopt` suggestion service logic update #2412

Open shashank-iitbhu opened 3 months ago

shashank-iitbhu commented 3 months ago

What this PR does / why we need it:

Which issue(s) this PR fixes _(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)_: Fixes #2374

Checklist:

google-oss-prow[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubeflow/katib/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
tenzen-y commented 3 months ago

/area gsoc

shashank-iitbhu commented 2 months ago

@tenzen-y I have added two new parameters, weight_decay and dropout_rate, to the Hyperopt example and passed them to mnist.py, but I haven't used them in the Net class yet in the train and test functions. If you check the logs for this e2e test, the maximum value of the loss metrics is an enormously large number. I can't figure out what I'm missing. Also tested this locally.

shashank-iitbhu commented 2 months ago

@tenzen-y https://github.com/kubeflow/katib/blob/867c40a1b0669446c774cd6e770a5b7bbf1eb2f1/pkg/suggestion/v1beta1/hyperopt/base_service.py#L265-L266 here the float values sampled from the distribution get converted to int for INT parameter type.