kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.63k stars 1.63k forks source link

Improvements to metadata pages in UI #2086

Closed neuromage closed 5 years ago

neuromage commented 5 years ago

Now that metadata pages are in KFP's UI thanks to Riley's work, there are still a few more items to take care of in terms of polish:

Assigning to Yuan to start work on this. I'll update this issue with any other outstanding items I find. Yuan, we can also chat in person to clarify these items as required. Thanks!

/assign @Bobgy

/cc @jessiezcc /cc @paveldournov /cc @dushyanthsc /cc @gaoning777

neuromage commented 5 years ago

For reference, here's a simple pipeline you can run (it's using TFX DSL) which will output some basic metadata in your cluster:

import argparse
import os
import tensorflow as tf

from typing import Text

import kfp
from tfx.components.evaluator.component import Evaluator
from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen
from tfx.components.example_validator.component import ExampleValidator
from tfx.components.model_validator.component import ModelValidator
from tfx.components.pusher.component import Pusher
from tfx.components.schema_gen.component import SchemaGen
from tfx.components.statistics_gen.component import StatisticsGen
from tfx.components.trainer.component import Trainer
from tfx.components.transform.component import Transform
from tfx.orchestration import metadata
from tfx.orchestration import pipeline
from tfx.orchestration.kubeflow import kubeflow_dag_runner
from tfx.proto import evaluator_pb2
from tfx.utils.dsl_utils import csv_input
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.extensions.google_cloud_ai_platform.trainer import executor as ai_platform_trainer_executor

_output_bucket = 'gs://your-bucket-here'

def _create_test_pipeline(pipeline_name: Text, pipeline_root: Text,
                          csv_input_location: Text, taxi_module_file: Text)
  """Creates a simple Kubeflow-based Chicago Taxi TFX pipeline for testing.

  Args:
    pipeline_name: The name of the pipeline.
    pipeline_root: The root of the pipeline output.
    csv_input_location: The location of the input data directory.
    taxi_module_file: The location of the module file for Transform/Trainer.
    container_image: The container image to use.

  Returns:
    A logical TFX pipeline.Pipeline object.
  """
  examples = csv_input(csv_input_location)

  example_gen = CsvExampleGen(input_base=examples)
  statistics_gen = StatisticsGen(input_data=example_gen.outputs.examples)
  infer_schema = SchemaGen(
      stats=statistics_gen.outputs.output, infer_feature_shape=False)
  validate_stats = ExampleValidator(
      stats=statistics_gen.outputs.output, schema=infer_schema.outputs.output)
  transform = Transform(
      input_data=example_gen.outputs.examples,
      schema=infer_schema.outputs.output,
      module_file=taxi_module_file)
  trainer = Trainer(
      module_file=taxi_module_file,
      transformed_examples=transform.outputs.transformed_examples,
      schema=infer_schema.outputs.output,
      transform_output=transform.outputs.transform_output,
      train_args=trainer_pb2.TrainArgs(num_steps=10000),
      eval_args=trainer_pb2.EvalArgs(num_steps=5000))
  model_analyzer = Evaluator(
      examples=example_gen.outputs.examples,
      model_exports=trainer.outputs.output,
      feature_slicing_spec=evaluator_pb2.FeatureSlicingSpec(specs=[
          evaluator_pb2.SingleSlicingSpec(
              column_for_slicing=['trip_start_hour'])
      ]))
  model_validator = ModelValidator(
      examples=example_gen.outputs.examples, model=trainer.outputs.output)
  pusher = Pusher(
      model_export=trainer.outputs.output,
      model_blessing=model_validator.outputs.blessing,
      push_destination=pusher_pb2.PushDestination(
          filesystem=pusher_pb2.PushDestination.Filesystem(
              base_directory=os.path.join(pipeline_root, 'model_serving'))))

  return pipeline.Pipeline(
      pipeline_name=pipeline_name,
      pipeline_root=pipeline_root,
      components=[
          example_gen, statistics_gen, infer_schema, validate_stats, transform,
          trainer, model_analyzer, model_validator, pusher
      ],
      enable_cache=False,  # Or True to use cache
  )

if __name__ == '__main__':
  # Copy sample CSV file from chicago taxi pipeline example to this location
  data_root = 'gs://your-bucket/data' 
  taxi_module_file = 'gs://your-bucket/taxi_utils.py'

  pipeline_name = 'kubeflow-simple-taxi-metadata'
  pipeline_root = 'gs://your-bucket/test'
  pipeline = _create_test_pipeline(pipeline_name, pipeline_root, data_root,                                  taxi_module_file)
  config = kubeflow_dag_runner.KubeflowRunnerConfig()

  kubeflow_dag_runner.KubeflowDagRunner(config=config).run(pipeline)
Bobgy commented 5 years ago

Thanks @neuromage! I'm taking a day off today and will start on these tomorrow.

A few questions on context:

Bobgy commented 5 years ago

/priority p0

k8s-ci-robot commented 5 years ago

@Bobgy: The label(s) area/frontend cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to [this](https://github.com/kubeflow/pipelines/issues/2086#issuecomment-530291498): >/area frontend Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 5 years ago

@Bobgy: The label(s) area/frontend cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to [this](https://github.com/kubeflow/pipelines/issues/2086#issuecomment-530291498): >/area frontend Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Bobgy commented 5 years ago

/area front-end

rmgogogo commented 5 years ago

/cc @rmgogogo

Bobgy commented 5 years ago

@neuromage How do you deploy pipeline with metadata? I tried kfp lite, it has some errors of missing 'mysql-credential' when starting up metadata server. Should I use helm to deploy the marketplace one?

dushyanthsc commented 5 years ago

@Bobgy The mysql credentials are picked up using K8 secret object. Basically create a Kubernetes Secret object named - "mysql-credential" with keys- "username" and "password" rest should be automatically be taken care of

neuromage commented 5 years ago

Thanks @Bobgy!

  • Can you send me a reference to MLMD api?

Yes, here it is: https://github.com/google/ml-metadata/blob/master/ml_metadata/proto/metadata_store_service.proto

It's in the KFP repo (kubeflow/pipelines) under /frontend

Bobgy commented 5 years ago

@dushyanthsc Thanks, I got the servers up.

Bobgy commented 5 years ago

@neuromage I'm trying to run the tfx sample you provided, but I'm stuck with how to get it running.

env:

Here's what I tried:

  1. Copy the code sample and name it metadata_sample.py
  2. Follow https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/ to install kfp sdk
  3. Also install tensorflow, tfx by pip in that conda environment
  4. Copy taxi data and utils from https://github.com/tensorflow/tfx/tree/master/tfx/examples/chicago_taxi_pipeline to my own bucket
  5. Change config values in metadata_sample.py to my own bucket
  6. python metadata_sample.py
    • I got some errors first, so I changed a little:
      • Add a ":" after def _create_test_pipeline(...):
      • Changed KubeflowRunnerConfig to KubeflowDagRunnerConfig because it seems to be renamed recently.

Here's what I got after fixing obvious problems. It has a lot of warnings, but I didn't see any errors. Can you give me some reference of how to run it?

/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam.
  'Some syntactic constructs of Python 3 are not yet fully supported by '
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/components/transform/executor.py:57: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/components/transform/executor.py:57: from_feature_spec (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
from_feature_spec is a deprecated, use schema_utils.schema_from_feature_spec
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/orchestration/pipeline.py:131: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

WARNING:tensorflow:metadata_db_root is deprecated, metadata_connection_config will be required in next release
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/orchestration/kubeflow/base_component.py:125: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
neuromage commented 5 years ago

You can ignore the warnings. You should get a compiled pipeline file, just like when using KFP SDK. Then you'll need to upload that and run it as before.

Bobgy commented 5 years ago

Thanks, I got the pipeline file successfully.

Bobgy commented 5 years ago

@neuromage which tfx version do you use?

I first tried 0.14.0, and met this issue: https://github.com/tensorflow/tfx/issues/603 Then I tried 0.13.0, and it seems needed features are not there yet. Then I tried 0.14.0rc1 and I got the following error when running the pipeline

/opt/venv/lib/python3.6/site-packages/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam.
  'Some syntactic constructs of Python 3 are not yet fully supported by '
Traceback (most recent call last):
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 200, in <module>
    main()
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 171, in main
    connection_config = _get_metadata_connection_config(kubeflow_metadata_config)
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 68, in _get_metadata_connection_config
    kubeflow_metadata_config.mysql_db_service_host)
TypeError: None has type NoneType, but expected one of: bytes, unicode

I am using a KFP lite deployment, how should I config kubeflow_metadata_config?

Bobgy commented 5 years ago

Never mind, I used the following config and it seems to work.

def _get_metadata_config():
    config = kubeflow_pb2.KubeflowMetadataConfig()
    config.mysql_db_service_host.environment_variable = 'MYSQL_SERVICE_HOST'
    config.mysql_db_service_port.environment_variable = 'MYSQL_SERVICE_PORT'
    config.mysql_db_name.value = 'metadb'
    config.mysql_db_user.value = 'root'
    config.mysql_db_password.value = ''

    return config
Bobgy commented 5 years ago

The list of executions page seems a little buggy. When I click on an execution, the first one in the group (they are grouped by pipeline) works, but the following items don't seem to be working.

@neuromage Can you explain what is expected behavior of execution list page? This is what I can see now: https://drive.google.com/file/d/1LJbth1bK-_ZCTe5d60M8nDRzrFjgux-n/view

For each execution, other than properties, we should also show the inputs and outputs that went into it. It would also be nice to be able to link to the said input and output.

Do we need this in execution list page or detail page, (or both)? Do we have a UX mock I can refer to?

neuromage commented 5 years ago

Thanks @Bobgy !

Bobgy commented 5 years ago

@neuromage thanks a lot!

neuromage commented 5 years ago

@Bobgy I have a few more requests :-)

Stretch goal, which I think we can discuss and track in a separate issue if needed: show a preview for each artifact type. How we preview would be based on the type of the artifact. For example, if it's a SchemaPath, we can show the schema text proto as JSON or something. If it's ExamplesPath, we can show the first 10 rows maybe. This could use ajchili's visualization server. This may need some in depth discussion, so feel free to schedule something on my calendar.

neuromage commented 5 years ago

/cc @paveldournov

Bobgy commented 5 years ago

@neuromage

Can we show URIs in the artifact detail page?

SG, will do so

Can we make GCS URIs clickable, in both artifact detail page and artifact listings page?

I need to investigate, which page should it link to? A page on google cloud console?

If a field has serialized json, can we attempt to parse and pretty print this?

SG, will do so.

The execution list still does not show the name of the execution, and I'm unable to click on any execution except the first one still (ignore if you already fixed this)

Already fixed in https://github.com/kubeflow/pipelines/pull/2135, I think it didn't make it to the version you tested.

Bobgy commented 5 years ago

@neuromage regarding the stretch goal, can you create a separate issue for this? What would be the priority? I have other p0 issues at hand, so I will only be able to take a look after other things.

neuromage commented 5 years ago

I need to investigate, which page should it link to? A page on google cloud console?

Yes, a page showing the bucket on Pantheon would be great. Thanks!

Bobgy commented 5 years ago

@neuromage Do you think if there are further gaps in UI that should be p0? Shall we close this and make another dedicated issue for tracking the stretch goal?

neuromage commented 5 years ago

Yes, this looks great now, thanks @Bobgy!