Improvements to metadata pages in UI

neuromage commented 5 years ago

Now that metadata pages are in KFP's UI thanks to Riley's work, there are still a few more items to take care of in terms of polish:

We should add a column showing the creation time of each artifact. In order to get this, we'll need to query the metadata API server for events, and find the associated event that produced any given artifact. The event will have a timestamp.
The list of executions page seems a little buggy. When I click on an execution, the first one in the group (they are grouped by pipeline) works, but the following items don't seem to be working.
For each execution, other than properties, we should also show the inputs and outputs that went into it. It would also be nice to be able to link to the said input and output.

Assigning to Yuan to start work on this. I'll update this issue with any other outstanding items I find. Yuan, we can also chat in person to clarify these items as required. Thanks!

/assign @Bobgy

/cc @jessiezcc /cc @paveldournov /cc @dushyanthsc /cc @gaoning777

neuromage commented 5 years ago

For reference, here's a simple pipeline you can run (it's using TFX DSL) which will output some basic metadata in your cluster:

import argparse
import os
import tensorflow as tf

from typing import Text

import kfp
from tfx.components.evaluator.component import Evaluator
from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen
from tfx.components.example_validator.component import ExampleValidator
from tfx.components.model_validator.component import ModelValidator
from tfx.components.pusher.component import Pusher
from tfx.components.schema_gen.component import SchemaGen
from tfx.components.statistics_gen.component import StatisticsGen
from tfx.components.trainer.component import Trainer
from tfx.components.transform.component import Transform
from tfx.orchestration import metadata
from tfx.orchestration import pipeline
from tfx.orchestration.kubeflow import kubeflow_dag_runner
from tfx.proto import evaluator_pb2
from tfx.utils.dsl_utils import csv_input
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.extensions.google_cloud_ai_platform.trainer import executor as ai_platform_trainer_executor

_output_bucket = 'gs://your-bucket-here'

def _create_test_pipeline(pipeline_name: Text, pipeline_root: Text,
                          csv_input_location: Text, taxi_module_file: Text)
  """Creates a simple Kubeflow-based Chicago Taxi TFX pipeline for testing.

  Args:
    pipeline_name: The name of the pipeline.
    pipeline_root: The root of the pipeline output.
    csv_input_location: The location of the input data directory.
    taxi_module_file: The location of the module file for Transform/Trainer.
    container_image: The container image to use.

  Returns:
    A logical TFX pipeline.Pipeline object.
  """
  examples = csv_input(csv_input_location)

  example_gen = CsvExampleGen(input_base=examples)
  statistics_gen = StatisticsGen(input_data=example_gen.outputs.examples)
  infer_schema = SchemaGen(
      stats=statistics_gen.outputs.output, infer_feature_shape=False)
  validate_stats = ExampleValidator(
      stats=statistics_gen.outputs.output, schema=infer_schema.outputs.output)
  transform = Transform(
      input_data=example_gen.outputs.examples,
      schema=infer_schema.outputs.output,
      module_file=taxi_module_file)
  trainer = Trainer(
      module_file=taxi_module_file,
      transformed_examples=transform.outputs.transformed_examples,
      schema=infer_schema.outputs.output,
      transform_output=transform.outputs.transform_output,
      train_args=trainer_pb2.TrainArgs(num_steps=10000),
      eval_args=trainer_pb2.EvalArgs(num_steps=5000))
  model_analyzer = Evaluator(
      examples=example_gen.outputs.examples,
      model_exports=trainer.outputs.output,
      feature_slicing_spec=evaluator_pb2.FeatureSlicingSpec(specs=[
          evaluator_pb2.SingleSlicingSpec(
              column_for_slicing=['trip_start_hour'])
      ]))
  model_validator = ModelValidator(
      examples=example_gen.outputs.examples, model=trainer.outputs.output)
  pusher = Pusher(
      model_export=trainer.outputs.output,
      model_blessing=model_validator.outputs.blessing,
      push_destination=pusher_pb2.PushDestination(
          filesystem=pusher_pb2.PushDestination.Filesystem(
              base_directory=os.path.join(pipeline_root, 'model_serving'))))

  return pipeline.Pipeline(
      pipeline_name=pipeline_name,
      pipeline_root=pipeline_root,
      components=[
          example_gen, statistics_gen, infer_schema, validate_stats, transform,
          trainer, model_analyzer, model_validator, pusher
      ],
      enable_cache=False,  # Or True to use cache
  )

if __name__ == '__main__':
  # Copy sample CSV file from chicago taxi pipeline example to this location
  data_root = 'gs://your-bucket/data' 
  taxi_module_file = 'gs://your-bucket/taxi_utils.py'

  pipeline_name = 'kubeflow-simple-taxi-metadata'
  pipeline_root = 'gs://your-bucket/test'
  pipeline = _create_test_pipeline(pipeline_name, pipeline_root, data_root,                                  taxi_module_file)
  config = kubeflow_dag_runner.KubeflowRunnerConfig()

  kubeflow_dag_runner.KubeflowDagRunner(config=config).run(pipeline)

Bobgy commented 5 years ago

Thanks @neuromage! I'm taking a day off today and will start on these tomorrow.

A few questions on context:

Can you send me a reference to MLMD api?
Is frontend codebase here: https://github.com/kubeflow/metadata/tree/master/frontend? or already in KFP repo?

Bobgy commented 5 years ago

/priority p0

k8s-ci-robot commented 5 years ago

@Bobgy: The label(s) area/frontend cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to [this](https://github.com/kubeflow/pipelines/issues/2086#issuecomment-530291498): >/area frontend Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

k8s-ci-robot commented 5 years ago

@Bobgy: The label(s) area/frontend cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to [this](https://github.com/kubeflow/pipelines/issues/2086#issuecomment-530291498): >/area frontend Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

Bobgy commented 5 years ago

/area front-end

rmgogogo commented 5 years ago

/cc @rmgogogo

Bobgy commented 5 years ago

@neuromage How do you deploy pipeline with metadata? I tried kfp lite, it has some errors of missing 'mysql-credential' when starting up metadata server. Should I use helm to deploy the marketplace one?

dushyanthsc commented 5 years ago

@Bobgy The mysql credentials are picked up using K8 secret object. Basically create a Kubernetes Secret object named - "mysql-credential" with keys- "username" and "password" rest should be automatically be taken care of

neuromage commented 5 years ago

Thanks @Bobgy!

Can you send me a reference to MLMD api?

Yes, here it is: https://github.com/google/ml-metadata/blob/master/ml_metadata/proto/metadata_store_service.proto

Is frontend codebase here: https://github.com/kubeflow/metadata/tree/master/frontend? or already in KFP repo?

It's in the KFP repo (kubeflow/pipelines) under /frontend

Bobgy commented 5 years ago

@dushyanthsc Thanks, I got the servers up.

Bobgy commented 5 years ago

@neuromage I'm trying to run the tfx sample you provided, but I'm stuck with how to get it running.

env:

tensorflow: 1.14.0
tfx: 0.14.0
kfp: 0.1.29

Here's what I tried:

Copy the code sample and name it metadata_sample.py
Follow https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/ to install kfp sdk
Also install tensorflow, tfx by pip in that conda environment
Copy taxi data and utils from https://github.com/tensorflow/tfx/tree/master/tfx/examples/chicago_taxi_pipeline to my own bucket
Change config values in metadata_sample.py to my own bucket
python metadata_sample.py
- I got some errors first, so I changed a little:
  - Add a ":" after def _create_test_pipeline(...):
  - Changed KubeflowRunnerConfig to KubeflowDagRunnerConfig because it seems to be renamed recently.

Here's what I got after fixing obvious problems. It has a lot of warnings, but I didn't see any errors. Can you give me some reference of how to run it?

/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam.
  'Some syntactic constructs of Python 3 are not yet fully supported by '
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/components/transform/executor.py:57: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/components/transform/executor.py:57: from_feature_spec (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
from_feature_spec is a deprecated, use schema_utils.schema_from_feature_spec
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/orchestration/pipeline.py:131: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

WARNING:tensorflow:metadata_db_root is deprecated, metadata_connection_config will be required in next release
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/orchestration/kubeflow/base_component.py:125: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

neuromage commented 5 years ago

You can ignore the warnings. You should get a compiled pipeline file, just like when using KFP SDK. Then you'll need to upload that and run it as before.

Bobgy commented 5 years ago

Thanks, I got the pipeline file successfully.

Bobgy commented 5 years ago

@neuromage which tfx version do you use?

I first tried 0.14.0, and met this issue: https://github.com/tensorflow/tfx/issues/603 Then I tried 0.13.0, and it seems needed features are not there yet. Then I tried 0.14.0rc1 and I got the following error when running the pipeline

/opt/venv/lib/python3.6/site-packages/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam.
  'Some syntactic constructs of Python 3 are not yet fully supported by '
Traceback (most recent call last):
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 200, in <module>
    main()
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 171, in main
    connection_config = _get_metadata_connection_config(kubeflow_metadata_config)
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 68, in _get_metadata_connection_config
    kubeflow_metadata_config.mysql_db_service_host)
TypeError: None has type NoneType, but expected one of: bytes, unicode

I am using a KFP lite deployment, how should I config kubeflow_metadata_config?

Bobgy commented 5 years ago

Never mind, I used the following config and it seems to work.

def _get_metadata_config():
    config = kubeflow_pb2.KubeflowMetadataConfig()
    config.mysql_db_service_host.environment_variable = 'MYSQL_SERVICE_HOST'
    config.mysql_db_service_port.environment_variable = 'MYSQL_SERVICE_PORT'
    config.mysql_db_name.value = 'metadb'
    config.mysql_db_user.value = 'root'
    config.mysql_db_password.value = ''

    return config

Bobgy commented 5 years ago

The list of executions page seems a little buggy. When I click on an execution, the first one in the group (they are grouped by pipeline) works, but the following items don't seem to be working.

@neuromage Can you explain what is expected behavior of execution list page? This is what I can see now: https://drive.google.com/file/d/1LJbth1bK-_ZCTe5d60M8nDRzrFjgux-n/view

What should happen when we click on expanded rows?
~When we click on the first row, it should open execution detail page. Is that right? I will try to fix this.~ UPDATE: I fixed this in: https://github.com/kubeflow/pipelines/pull/2127

Should names of the executions be nonempty? Where is data?

I tried to debug this, UI is expecting NAME property in execution, but it's not present in the response sent from backend. @neuromage, is this an issue of backend? (I'm using image: gcr.io/kubeflowtryouts/ml_metadata_store_server:latest) An example response for a single execution I get is

{
"id": 2,
"typeId": 5,
"propertiesMap": [
["component_id", {
"stringValue": "StatisticsGen"
}],
["pipeline_name", {
"stringValue": "kubeflow-simple-taxi-metadata"
}],
["pipeline_root", {
"stringValue": "gs://gongyuan-test/kfp-test"
}],
["run_id", {
"stringValue": "kubeflow-simple-taxi-metadata-t5t59"
}],
["state", {
"stringValue": "complete"
}]
],
"customPropertiesMap": []
}"

For each execution, other than properties, we should also show the inputs and outputs that went into it. It would also be nice to be able to link to the said input and output.

Do we need this in execution list page or detail page, (or both)? Do we have a UX mock I can refer to?

neuromage commented 5 years ago

Thanks @Bobgy !

For name, yeah, that looks wrong. Can we use component_id for the name instead?
For execution inputs/outputs, I think it's ok to put it in the execution details page. Unfortunately, there is no mock for this, so I'll leave it up to you on how best to implement. I imagine a simple section for inputs, with a listing of Artifact name and id would be great. If the id is deep-linked to the artifact detail page, that would be nice too. Similarly for output.

Bobgy commented 5 years ago

@neuromage thanks a lot!

Sure, I will use component_id instead
Then I will use my own judgement about that.

neuromage commented 5 years ago

@Bobgy I have a few more requests :-)

Can we show URIs in the artifact detail page?
Can we make GCS URIs clickable, in both artifact detail page and artifact listings page?
If a field has serialized json, can we attempt to parse and pretty print this?
The execution list still does not show the name of the execution, and I'm unable to click on any execution except the first one still (ignore if you already fixed this)

Stretch goal, which I think we can discuss and track in a separate issue if needed: show a preview for each artifact type. How we preview would be based on the type of the artifact. For example, if it's a SchemaPath, we can show the schema text proto as JSON or something. If it's ExamplesPath, we can show the first 10 rows maybe. This could use ajchili's visualization server. This may need some in depth discussion, so feel free to schedule something on my calendar.

neuromage commented 5 years ago

/cc @paveldournov

Bobgy commented 5 years ago

@neuromage

Can we show URIs in the artifact detail page?

SG, will do so

Can we make GCS URIs clickable, in both artifact detail page and artifact listings page?

I need to investigate, which page should it link to? A page on google cloud console?

If a field has serialized json, can we attempt to parse and pretty print this?

SG, will do so.

The execution list still does not show the name of the execution, and I'm unable to click on any execution except the first one still (ignore if you already fixed this)

Already fixed in https://github.com/kubeflow/pipelines/pull/2135, I think it didn't make it to the version you tested.

Bobgy commented 5 years ago

@neuromage regarding the stretch goal, can you create a separate issue for this? What would be the priority? I have other p0 issues at hand, so I will only be able to take a look after other things.

neuromage commented 5 years ago

I need to investigate, which page should it link to? A page on google cloud console?

Yes, a page showing the bucket on Pantheon would be great. Thanks!

Bobgy commented 5 years ago

@neuromage Do you think if there are further gaps in UI that should be p0? Shall we close this and make another dedicated issue for tracking the stretch goal?

neuromage commented 5 years ago

Yes, this looks great now, thanks @Bobgy!

kubeflow / pipelines

Improvements to metadata pages in UI #2086