[Proposal] Support JSON format for `file-metrics-collector`

tenzen-y commented 2 years ago

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.]

Motivation

Currently, it is difficult to parse JSON format files by file-metrics-collector using regexp filter since file-metrics-collector is designed to use TEXT format files. I believe if file-metrics-collector supports JSON format files, we can be further made Katib powerful because we can make use of JSON format metrics files without regexp more easily. Therefore, I would like to support JSON format in file-metrics-collector, such as the following example, which is split by newlines.

{"foo": “bar", “fiz": “buz"…}
{“foo": “bar", “fiz": “buz"…}
{“foo": “bar", “fiz": “buz"…}
{“foo": “bar", “fiz": “buz"…}
…

This JSON format is also used in cloudml-hypertune recommended for use in GCP AI Platform or Vertex AI.

If you use a custom container for training or if you want to perform hyperparameter tuning with a framework other than TensorFlow, then you must use the cloudml-hypertune Python package to report your hyperparameter metric to AI Platform Training.

https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning#other_machine_learning_frameworks_or_custom_containers

Design

I'm thinking of the following Kubernetes API and webhook. Also, file-metrics-collector collects values whoose key is spec.objective.objectiveMetricName and spec.objective.additionalMetricNames from the metrcs file if FileSystemFileFormat is set Json.

common_types.go

+ type FileSystemFileFormat string
+
+ const (
+   TextFormat    FileSystemFileFormat = "Text"
+   JsonFormat    FileSystemFileFormat = "Json"
+ )

type FileSystemPath struct {
  Path string                     `json:"path,omitempty"`
  Kind FileSystemKind             `json:"kind,omitempty"`
+ FileFormat FileSystemFileFormat `json:"fileFormat,omitempty"`
}

validator.go

func (g *DefaultValidator) validateMetricsCollector(inst *experimentsv1beta1.Experiment) error {
  mcSpec := inst.Spec.MetricsCollectorSpec
  mcKind := mcSpec.Collector.Kind
  ...
  switch mcKind {
  ...
  case commonapiv1beta1.FileCollector:
    ...
+     fileFormat := mcSpec.Source.FileSystemPath.FileSytemFileFormat
+     if fileFormat == "" {
+       fileFormat = commonapiv1beta1.TextFormat
+     } else if fileFormat != commonapiv1beta1.TextFormat && fileFormat != commonapiv1beta1.JsonFormat {
+         return return fmt.Errorf("The format of the metrics file is required by .spec.metricsCollectorSpec.source.fileSystemPath.fileFormat.")
+     }
  ...

Does it sound good to you? @kubeflow/wg-automl-leads

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

tenzen-y commented 2 years ago

/kind feature

andreyvelich commented 2 years ago

Thank you for submitting this proposal @tenzen-y! I have few questions.

because we can make use of JSON format metrics files without regexp more easily

Do you mean it will be easier for user to create such Experiments ? E.g. they don't need to manually add regex to Metrics Collector spec.

How we will change the parsing process in File Metrics Collector ?

Will we have special regex for JSON that we are going to use automatically, when user selects fileFormat=JSON ?

What do you think @gaocegege @johnugeorge ?

tenzen-y commented 2 years ago

Sorry for the late reply. @andreyvelich

Do you mean it will be easier for user to create such Experiments ? E.g. they don't need to manually add regex to Metrics Collector spec.

Yes, users do not need to specify regexp using fileFormat=JSON.

Will we have special regex for JSON that we are going to use automatically, when user selects fileFormat=JSON ?

I think we can parse JSON files of the format shown in the proposal in the following script.

log.json

{"checkpoint_path": "", "global_step": "0", "loss": "0.22082142531871796", "timestamp": 1638422847.28721, "trial": "0"}
{"acc": "0.9349666833877563", "checkpoint_path": "", "global_step": "0", "timestamp": 1638422847.287801, "trial": "0"}
{"checkpoint_path": "", "global_step": "1", "loss": "0.1414974331855774", "timestamp": 1638422870.035161, "trial": "0"}
{"acc": "0.9586416482925415", "checkpoint_path": "", "global_step": "1", "timestamp": 1638422870.037459, "trial": "0"}
{"checkpoint_path": "", "global_step": "2", "loss": "0.10683439671993256", "timestamp": 1638422900.152162, "trial": "0"}
{"acc": "0.9688166379928589", "checkpoint_path": "", "global_step": "2", "timestamp": 1638422900.1529338, "trial": "0"}
{"checkpoint_path": "", "global_step": "3", "loss": "0.08619903773069382", "timestamp": 1638422927.9910269, "trial": "0"}
{"acc": "0.9747458100318909", "checkpoint_path": "", "global_step": "3", "timestamp": 1638422927.991675, "trial": "0"}
{"checkpoint_path": "", "global_step": "4", "loss": "0.07176543772220612", "timestamp": 1638422952.8473432, "trial": "0"}
{"acc": "0.9790566563606262", "checkpoint_path": "", "global_step": "4", "timestamp": 1638422952.848325, "trial": "0"}

parse.go

package main

import (
    "encoding/json"
    "fmt"

    "github.com/nxadm/tail"
)

func main() {
    t, _ := tail.TailFile("./log.json", tail.Config{Follow: true})
    metrics := []string{"loss"}

    for line := range t.Lines {

        logText := line.Text
        var jsonObj map[string]interface{}
        json.Unmarshal([]byte(logText), &jsonObj)

        for _, metric := range metrics {
            if _, exist := jsonObj[metric]; !exist {
                continue
            }
            fmt.Printf("%v\n", jsonObj[metric])
        }
    }
}

andreyvelich commented 2 years ago

Sounds good. Probably we should also refactor watchMetricsFile to support EarlyStopping in such Metrics Collector format.

WDYT about this proposal @gaocegege @johnugeorge ?

tenzen-y commented 2 years ago

Probably we should also refactor watchMetricsFile to support EarlyStopping in such Metrics Collector format.

It makes sense. @andreyvelich

tenzen-y commented 2 years ago

I'll wait for feedback from the community until December 10 (UTC+9).

andreyvelich commented 2 years ago

@gaocegege @johnugeorge Please give your feedback on this proposal

tenzen-y commented 2 years ago

I'll wait for feedback from the community until December 10 (UTC+9).

I'll postpone starting the implementation of this feature after December 18 (UTC+9) and wait for feedback from @gaocegege @johnugeorge

gaocegege commented 2 years ago

Sorry for the late reply.

I am wondering how to use the JSON feature from the user side. Could you please give us an example, e.g. Experiment YAML?

tenzen-y commented 2 years ago

Thanks for your reply! @gaocegege I think users can use the below YAML, does it sound good?

https://github.com/kubeflow/katib/blob/7be1f0ad5730ab6216570d2179f5b23f0b39d26d/examples/v1beta1/metrics-collector/file-metrics-collector.yaml

apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
  namespace: kubeflow
  name: file-metrics-collector
spec:
  objective:
    type: maximize
    goal: 0.99
    objectiveMetricName: accuracy
    additionalMetricNames:
      - loss
  metricsCollectorSpec:
    source:
-     filter:
-       metricsFormat:
-         - "{metricName: ([\\w|-]+), metricValue: ((-?\\d+)(\\.\\d+)?)}"
      fileSystemPath:
        path: "/katib/mnist.log"
        kind: File
# omitempty; default=Text
+       fileFormat: Json
    collector:
      kind: File
...

gaocegege commented 2 years ago

Gotcha, LGTM.

BTW, the variable should be JSON instead of Json.

I think we can go ahead! 🎉

tenzen-y commented 2 years ago

Thank you for giving me feedback! @gaocegege

BTW, the variable should be JSON instead of Json.

It makes sense.

tenzen-y commented 2 years ago

I'll wait for feedback from the community until December 10 (UTC+9).

I'll postpone starting the implementation of this feature after December 18 (UTC+9) and wait for feedback from @gaocegege @johnugeorge

I'd like to start the implementation of this feature!

/assign

gaocegege commented 2 years ago

Please assign me when the PR is ready to review

Thanks for your contribution! :tada: :+1:

tenzen-y commented 2 years ago

Please assign me when the PR is ready to review

Thanks for your contribution! 🎉 👍

Sure, Thanks for your review! @gaocegege

kubeflow / katib

[Proposal] Support JSON format for `file-metrics-collector` #1744

Motivation

Design