Closed wilbry closed 4 months ago
I've tested as similar approach using 1.6.0.rc0 here:
import kfp
import kfp.components as comp
import kfp.v2.dsl as dsl
from kfp.v2 import compiler
from kfp.v2.dsl import (
component,
Input,
Output,
Artifact,
Dataset,
Metrics,
Model
)
from typing import NamedTuple
#import kfp.components as comp
#from kfp import compiler
#import kfp.dsl as dsl
#from kfp.components import OutputPath
#from kfp.components import InputPath
@component(
packages_to_install=['sklearn'],
output_component_file='download_data_component_sdk_v2.yaml'
)
def download_data(output_data: Output[Dataset]):
import json
import argparse
from pathlib import Path
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Gets and split dataset
x, y = load_breast_cancer(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# Creates `data` structure to save and
# share train and test datasets.
data = {'x_train' : x_train.tolist(),
'y_train' : y_train.tolist(),
'x_test' : x_test.tolist(),
'y_test' : y_test.tolist()}
# Creates a json object based on `data`
data_json = json.dumps(data)
# Saves the json object into a file
with open(output_data.path, 'w') as out_file:
json.dump(data_json, out_file)
@component(
packages_to_install=['sklearn'],
output_component_file='train_component_sdk_v2.yaml'
)
def train(trainData: Input[Dataset], modelData: Output[Model], mlpipeline_metrics: Output[Metrics])-> str:
import json
from typing import NamedTuple
from collections import namedtuple
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from joblib import dump
import os
# Open and reads file "data"
with open(trainData) as data_file:
data = json.load(data_file)
data = json.loads(data)
x_train = data['x_train']
y_train = data['y_train']
x_test = data['x_test']
y_test = data['y_test']
# Initialize and train the model
model = DecisionTreeClassifier(max_depth=3)
model.fit(x_train, y_train)
# Get predictions
y_pred = model.predict(x_test)
# Get accuracy
accuracy = accuracy_score(y_test, y_pred)
# Save output into file
#with open(args.accuracy, 'w') as accuracy_file:
# accuracy_file.write(str(accuracy))
# Exports two sample metrics:
metrics = {
'metrics': [{
'name': 'accuracy',
'numberValue': float(accuracy),
'format': "PERCENTAGE"
}]}
with open(mlpipeline_metrics.path, 'w') as f:
json.dump(metrics, f)
dump(model, os.path.join(modelData, 'model.joblib'))
return modelData.uri
kfserving_op = comp.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kubeflow/kfserving/component.yaml')
def kubeflow_deploy_op():
return dsl.ContainerOp(
name = 'deploy',
image = KUBEFLOW_DEPLOYER_IMAGE,
arguments = [
'--model-export-path', model_path,
'--server-name', model_name,
]
)
@dsl.pipeline(
name='iris-deploy-pipeline-3',
description='Pipeline and deploy for IRIS.'
)
def sklearn_pipeline(my_num: int = 1000,
my_name: str = 'some text',
my_url: str = 'http://example.com'):
download_task = download_data() # The download_data_op factory function returns
# a dsl.ContainerOp class instance.
train_task = train(download_task.output)
kfserving_op(action='apply', model_name='sklearn-example',model_uri=train_task.outputs['Output'],framework='sklearn')
# Specify argument values for your pipeline run.
arguments = {'a': '7', 'b': '8'}
# Create a pipeline run, using the client you initialized in a prior step.
compiler.Compiler().compile(
pipeline_func=sklearn_pipeline,
package_path='sklearn_pipeline_sdk_v2.yaml')
Unfortunately it throws a weird error:
TypeError: Passing value "True" with type "String" (as "Parameter") to component input "Enable Istio Sidecar" with type "Bool" (as "Artifact") is incompatible. Please fix the type of the component input.
The packaged KFServing component isn't currently compatible with V2, due to the Bool type no longer being supported as far as I know. The issue I got happens when that is fixed.
/assign
Hi @wilbry, thank you for your efforts porting kfserving component to use v2 semantics! Sorry it took a while until I have time to investigate.
Your initial post uses OutputArtifact, but it doesn't exist. Please refer to our updated documentation: https://www.kubeflow.org/docs/components/pipelines/sdk/v2/python-function-components/
OutputArtifact(Model) should be changed to Output[Model]
@wilbry can you share your kfserving_v2.yaml component? I fixed your error, but cannot proceed, because I'm not sure what changes you made to kfserving_v2.yaml.
Quite a few fixes have been released to v2 compatible mode released in kfp sdk 1.7.0 and KFP backend 1.7.0-rc.3 is out. It'll be super helpful if you can have a try on the latest releases.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Environment
Steps to reproduce
Create lightweight component using OutputArtifact, compiled with V1 compiler, using V2 Compatible, and run through the UI. The following error appears, I think in the effort to create the output artifact data library.
Expected result
The artifact should be created without error, so that the pipeline can run successfully.
Materials and Reference
I am working on a sample case in response to the discussion taking place in #5453. The code at the moment is
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.