kubeflow / training-operator

Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.58k stars 687 forks source link

Migrate to controller-runtime logger #2048

Open tenzen-y opened 5 months ago

tenzen-y commented 5 months ago

Currently, the training-operator uses some types of logger like this:

So, based on this discussion, I'd like to suggest using the controller-runtime logger everywhere to increase maintainability and consistency.

champon1020 commented 5 months ago

Could I work on this task?

tenzen-y commented 5 months ago

@champon1020 Sure, Feel free to assign yourself with a /assign comment.

champon1020 commented 5 months ago

/assign

champon1020 commented 3 months ago

Sorry for the late action but I'm going to tackle this issue. Then, I have two questions about design of implementation.

The logger, which is initialized by calling ctrl.SetLogger, can be referred by "sigs.k8s.io/controller-runtime/pkg/log".Log and each k8s controller has the logger reference in their structure. https://github.com/kubeflow/training-operator/blob/f3792b08bdcc08c7b394336d2a2c0cd3356bb5dd/pkg/controller.v1/mpi/mpijob_controller.go#L109

So, I think we should pass the logger and attach key-values instead of returning new logger in the LoggerForXXX functions of commonutil pacakge.

func LoggerForUnstructured(logger logr.Logger, *metav1unstructured.Unstructured, kind string) logr.Logger {
  return logger.WithValues(...)
}

Also, Warning method of logrus.Logger is used sometimes but logr.Logger only has Info and Error methods. While it could be possible to distinguish the log-level by using V method of logr.Logger, I think it is better to set rules for log-level numbers. (Alternatively, it would be possible to unify Info and Warning logs into Info)

Please let me know your opinions. (cc: @tenzen-y)

tenzen-y commented 3 months ago

Sorry for the late action but I'm going to tackle this issue. Then, I have two questions about design of implementation.

The logger, which is initialized by calling ctrl.SetLogger, can be referred by "sigs.k8s.io/controller-runtime/pkg/log".Log and each k8s controller has the logger reference in their structure.

https://github.com/kubeflow/training-operator/blob/f3792b08bdcc08c7b394336d2a2c0cd3356bb5dd/pkg/controller.v1/mpi/mpijob_controller.go#L109

So, I think we should pass the logger and attach key-values instead of returning new logger in the LoggerForXXX functions of commonutil pacakge.

func LoggerForUnstructured(logger logr.Logger, *metav1unstructured.Unstructured, kind string) logr.Logger {
  return logger.WithValues(...)
}

Also, Warning method of logrus.Logger is used sometimes but logr.Logger only has Info and Error methods. While it could be possible to distinguish the log-level by using V method of logr.Logger, I think it is better to set rules for log-level numbers. (Alternatively, it would be possible to unify Info and Warning logs into Info)

Please let me know your opinions. (cc: @tenzen-y)

Actually, the original logger is no longer needed. In other words, we can remove the whole of this file.

Because we can define the controller runtime logger for each framework controller with the framework name, and then we can pass the generated logger into the common controllers like pod / job controller via context using these: https://github.com/kubernetes-sigs/controller-runtime/blob/1f5b39fa59d15fae78e521c9c9f2acabbbb3ea17/alias.go#L135-L147

champon1020 commented 2 months ago

@tenzen-y Thank you for your description. I understood it :+1: