kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[backend] OutputArtifact path is None #5560

Closed wilbry closed 4 months ago

wilbry commented 3 years ago

Environment

Steps to reproduce

Create lightweight component using OutputArtifact, compiled with V1 compiler, using V2 Compatible, and run through the UI. The following error appears, I think in the effort to create the output artifact data library.

Traceback (most recent call last):
  File "/tmp/tmp.VtSD5WzDcx", line 594, in <module>
    executor_main()
  File "/tmp/tmp.VtSD5WzDcx", line 588, in executor_main
    function_to_execute=function_to_execute)
  File "/tmp/tmp.VtSD5WzDcx", line 353, in __init__
    artifacts_list[0])
  File "/tmp/tmp.VtSD5WzDcx", line 366, in _make_output_artifact
    return OutputArtifact(artifact_type=type(artifact), artifact=artifact)
  File "/tmp/tmp.VtSD5WzDcx", line 296, in __init__
    os.makedirs(self.path, exist_ok=True)
  File "/usr/local/lib/python3.7/os.py", line 208, in makedirs
    head, tail = path.split(name)
  File "/usr/local/lib/python3.7/posixpath.py", line 107, in split
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
F0428 16:41:59.832704      37 main.go:56] Failed to successfuly execute component: exit status 1
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000c80300, 0xc000652000, 0x61, 0xb6)
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:769 +0xb9
github.com/golang/glog.(*loggingT).output(0x2890520, 0xc000000003, 0xc0000fe000, 0x27d6a2a, 0x7, 0x38, 0x0)
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:720 +0x3b3
github.com/golang/glog.(*loggingT).printf(0x2890520, 0x3, 0x1b2c9c8, 0x2b, 0xc000a65f58, 0x1, 0x1)
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:655 +0x153
github.com/golang/glog.Fatalf(...)
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:1148
main.main()
    /build/cmd/launch/main.go:56 +0x357

Expected result

The artifact should be created without error, so that the pipeline can run successfully.

Materials and Reference

I am working on a sample case in response to the discussion taking place in #5453. The code at the moment is

from kfp import components
from kfp.dsl.io_types import OutputArtifact, Model
from kfp.dsl import pipeline, PipelineExecutionMode
from kfp.compiler import Compiler

def train_model(model: OutputArtifact(Model)) -> str:
  from sklearn.datasets import make_classification
  from sklearn.ensemble import RandomForestClassifier
  from joblib import dump
  import os

  fake_features, fake_labels = make_classification(n_features = 5)

  classifier = RandomForestClassifier(n_estimators=3)
  classifier.fit(fake_features, fake_labels)

  dump(classifier, os.path.join(model.path, 'model.joblib'))

  return model.uri

train_model_op = components.create_component_from_func_v2(train_model, packages_to_install=['sklearn'])
kf_serving_op = components.load_component_from_file('./components/kfserving_v2.yaml')

@pipeline(
  name='output_artifact_sample',
  description='Demonstrate using OutputArtifact URI information in next component',
  pipeline_root='minio://sample_output_artifiact_root'
)
def simple_kf_serving_pipeline():
    train_step = train_model_op()

    kf_serving_op(
            action = 'apply',
            model_name = 'output_artifact_sample',
            model_uri = train_step.outputs['Output'],
            framework = 'sklearn',
            service_account= 'sa',
            namespace='default')

def main():
    Compiler(mode = PipelineExecutionMode.V2_COMPATIBLE).compile(simple_kf_serving_pipeline, 'sample_artifact_pipeline.yaml')

if __name__ == "__main__":
    main()

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

juangon commented 3 years ago

I've tested as similar approach using 1.6.0.rc0 here:

import kfp
import kfp.components as comp
import kfp.v2.dsl as dsl
from kfp.v2 import compiler
from kfp.v2.dsl import (
    component,
    Input,
    Output,
    Artifact,
    Dataset,
    Metrics,
    Model
)

from typing import NamedTuple
#import kfp.components as comp
#from kfp import compiler
#import kfp.dsl as dsl
#from kfp.components import OutputPath
#from kfp.components import InputPath

@component(
    packages_to_install=['sklearn'],
    output_component_file='download_data_component_sdk_v2.yaml'
)
def download_data(output_data: Output[Dataset]):

    import json

    import argparse
    from pathlib import Path

    from sklearn.datasets import load_breast_cancer
    from sklearn.model_selection import train_test_split

    # Gets and split dataset
    x, y = load_breast_cancer(return_X_y=True)
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

    # Creates `data` structure to save and 
    # share train and test datasets.
    data = {'x_train' : x_train.tolist(),
            'y_train' : y_train.tolist(),
            'x_test' : x_test.tolist(),
            'y_test' : y_test.tolist()}

    # Creates a json object based on `data`
    data_json = json.dumps(data)

    # Saves the json object into a file
    with open(output_data.path, 'w') as out_file:
        json.dump(data_json, out_file)

@component(
    packages_to_install=['sklearn'],
    output_component_file='train_component_sdk_v2.yaml'
)
def train(trainData: Input[Dataset], modelData: Output[Model], mlpipeline_metrics: Output[Metrics])-> str:
    import json
    from typing import NamedTuple
    from collections import namedtuple
    from sklearn.metrics import accuracy_score
    from sklearn.tree import DecisionTreeClassifier
    from joblib import dump
    import os

    # Open and reads file "data"
    with open(trainData) as data_file:
        data = json.load(data_file)    

    data = json.loads(data)

    x_train = data['x_train']
    y_train = data['y_train']
    x_test = data['x_test']
    y_test = data['y_test']

    # Initialize and train the model
    model = DecisionTreeClassifier(max_depth=3)
    model.fit(x_train, y_train)

    # Get predictions
    y_pred = model.predict(x_test)

    # Get accuracy
    accuracy = accuracy_score(y_test, y_pred)

    # Save output into file
    #with open(args.accuracy, 'w') as accuracy_file:
    #    accuracy_file.write(str(accuracy))

     # Exports two sample metrics:
    metrics = {
      'metrics': [{
          'name': 'accuracy',
          'numberValue':  float(accuracy),
          'format': "PERCENTAGE"
        }]}    

    with open(mlpipeline_metrics.path, 'w') as f:
        json.dump(metrics, f)

    dump(model, os.path.join(modelData, 'model.joblib'))

    return modelData.uri

kfserving_op = comp.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kubeflow/kfserving/component.yaml')

def kubeflow_deploy_op():
    return dsl.ContainerOp(
        name = 'deploy',
        image = KUBEFLOW_DEPLOYER_IMAGE,
        arguments = [
            '--model-export-path', model_path,
            '--server-name', model_name,
        ]
    )

@dsl.pipeline(
   name='iris-deploy-pipeline-3',
   description='Pipeline and deploy for IRIS.'
)
def sklearn_pipeline(my_num: int = 1000, 
    my_name: str = 'some text', 
    my_url: str = 'http://example.com'):
    download_task = download_data() # The download_data_op factory function returns
                            # a dsl.ContainerOp class instance. 

    train_task = train(download_task.output)    

    kfserving_op(action='apply', model_name='sklearn-example',model_uri=train_task.outputs['Output'],framework='sklearn')

# Specify argument values for your pipeline run.
arguments = {'a': '7', 'b': '8'}

# Create a pipeline run, using the client you initialized in a prior step.
compiler.Compiler().compile(
    pipeline_func=sklearn_pipeline,
    package_path='sklearn_pipeline_sdk_v2.yaml')

Unfortunately it throws a weird error:

TypeError: Passing value "True" with type "String" (as "Parameter") to component input "Enable Istio Sidecar" with type "Bool" (as "Artifact") is incompatible. Please fix the type of the component input.

wilbry commented 3 years ago

The packaged KFServing component isn't currently compatible with V2, due to the Bool type no longer being supported as far as I know. The issue I got happens when that is fixed.

Bobgy commented 3 years ago

/assign

Bobgy commented 3 years ago

Hi @wilbry, thank you for your efforts porting kfserving component to use v2 semantics! Sorry it took a while until I have time to investigate.

Your initial post uses OutputArtifact, but it doesn't exist. Please refer to our updated documentation: https://www.kubeflow.org/docs/components/pipelines/sdk/v2/python-function-components/

Bobgy commented 3 years ago

OutputArtifact(Model) should be changed to Output[Model]

@wilbry can you share your kfserving_v2.yaml component? I fixed your error, but cannot proceed, because I'm not sure what changes you made to kfserving_v2.yaml.

Bobgy commented 3 years ago

Quite a few fixes have been released to v2 compatible mode released in kfp sdk 1.7.0 and KFP backend 1.7.0-rc.3 is out. It'll be super helpful if you can have a try on the latest releases.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 4 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.