Closed Electronic-Waste closed 2 months ago
ref issue: #2340
PTAL👀 @andreyvelich @johnugeorge
@andreyvelich I add my code here: https://github.com/kubeflow/katib/blob/db17214cf02c8d4eb9dafc21a15ffb7af4596015/pkg/webhook/v1beta1/pod/inject_webhook.go#L140-L142
// Pass env variable KATIB_TRIAL_NAME to training containers using fieldPath.
for idx := range mutatedPod.Spec.Containers {
if mutatedPod.Spec.Containers[idx].Env == nil {
mutatedPod.Spec.Containers[idx].Env = []v1.EnvVar{}
}
mutatedPod.Spec.Containers[idx].Env = append(
mutatedPod.Spec.Containers[idx].Env,
v1.EnvVar{
Name: consts.EnvTrialName,
ValueFrom: &v1.EnvVarSource{
FieldRef: &v1.ObjectFieldSelector{
FieldPath: fmt.Sprintf("metadata.labels['%s']", consts.LabelTrialName),
},
},
},
)
}
PTAL👀. Thank you for your review!
/area gsoc
I don't think, we should mutate KATIB_TRIAL_NAME
env to all containers in the Pod.
So maybe, similar as in wrapWorkerContainer
function, we should only add this environment to the primary container.
Please rebase your PR as well @Electronic-Waste
@Electronic-Waste Additionally, I think we forgot to add this line to the post-gen script of Katib SDK:
# Import Katib report metrics functions
from kubeflow.katib.api.report_metrics import report_metrics
Similar to how we include other imports to the__init__.py
.
Since this file is auto-generated by code-gen, we need to update it via post-gen script.
@Electronic-Waste Please update this script, otherwise after we re-generate the SDK, your import will be deleted.
@andreyvelich Done! I've rebased the branch, modified the code to pass env variable only to the primary container and added the importing code to the post-gen script of Katib SDK.
@andreyvelich Thank you for your detailed review!
@andreyvelich I've wrapped the code into mutatePodEnv
and added some unit tests for it. PTAL👀
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: andreyvelich
The full list of commands accepted by this bot can be found here.
The pull request process is described here
What this PR does / why we need it:
I add a new parameter
metrics_collector_config
totune
function. Design details: https://github.com/kubeflow/katib/blob/cc95ef03cb25df3d86a1a1c20c9c69bad17fce92/docs/proposals/push-based-metrics-collection.md#add-new-parameter-in-tuneWhich issue(s) this PR fixes _(optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged)_: Fixes #Checklist: