tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.org/tfx
Apache License 2.0
2.12k stars 708 forks source link

Kubeflow pipeline fails with ValueError: ...KubeflowMetadataConfig in TFX 1.0.0 #4126

Closed nroberts1 closed 2 years ago

nroberts1 commented 3 years ago

Using:

metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config()
runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(kubeflow_metadata_config=metadata_config, tfx_image=BASE_IMAGE)

when I run the pipeline in TFX 0.30.0 the pipeline runs fine, but once I update to 1.0.0 the pipeline fails with: ValueError: metadata_connection_config is expected to be in type ml_metadata.ConnectionConfig, but got type type.googleapis.com/tfx.orchestration.kubeflow.proto.KubeflowMetadataConfig

Stacktrace from pod's log:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 470, in <module>
    main()
  File "/usr/local/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 428, in main
    deployment_config = runner_utils.extract_local_deployment_config(tfx_ir)
  File "/usr/local/lib/python3.7/site-packages/tfx/orchestration/local/runner_utils.py", line 39, in extract_local_deployment_config
    return _to_local_deployment(result)
  File "/usr/local/lib/python3.7/site-packages/tfx/orchestration/local/runner_utils.py", line 99, in _to_local_deployment
    input_config.metadata_connection_config.type_url))
ValueError: metadata_connection_config is expected to be in type ml_metadata.ConnectionConfig, but got type type.googleapis.com/tfx.orchestration.kubeflow.proto.KubeflowMetadataConfig

Kubeflow is running on GCP

0.30.0: Screenshot from 2021-08-06 08-08-29

1.0.0: Screenshot from 2021-08-06 08-18-53

Obviously the BeamDagRunner uses metadata_connection_config of type ml_metadata.ConnectionConfig as this seems to work fine in both 0.30.0 and 1.0.0.

1025KB commented 3 years ago

In your dsl.pipeline, is metadata_connection_config still set?

Here is an example we have

SMesser commented 3 years ago

Am facing the same error using a variant of the bigquery_ml example. In my variant, import statements and most code in kubeflow_dag_runner.py & the pipeline-creation function are the same as in the example. There are a differences for the ExampleGen component, which is reading an existing BQ table rather than the usual Taxi upload. I've also changed the file structure a bit, based on structures used in the 0.26.0 documentation. I also used "pip freeze > requirements.txt" from within the tensorflow/tfx:1.0.0 image and made sure my environment matched versions of TFX, tensorboard, requests, and kfp-pipeline-spec.

(I'm trying to verify what other packages have substantial differences, but those seemed like the main ones.)

Perhaps #4179 is relevant here too.

nroberts1 commented 3 years ago

Thanks to @1025KB's linked example I was able to get it working. I believe the change was from using:

tfx.ochestration.experimental.kubeflow...

rather than:

tfx.orchestration.kubeflow...

So I now have:


        metadata_config = (tfx.orchestration.experimental.get_default_kubeflow_metadata_config())

        runner_config = tfx.orchestration.experimental.KubeflowDagRunnerConfig(
            kubeflow_metadata_config=metadata_config,
            tfx_image=BASE_IMAGE)

Not convinced this doesn't mean there's a problem as I'd imagine the non experimental version should work, but this did get it working okay. Hope that helps you @SMesser

SMesser commented 3 years ago

Thanks @nroberts1 , but I still have the same errors. The first variant I tried was:

from tfx import v1 as tfx

# Then reference the following
tfx.orchestration.experimental.get_default_kubeflow_metadata_config
tfx.orchestration.experimental.KubeflowDagRunner
tfx.orchestration.experimental.KubeflowDagRunnerConfig

The above and the first of the following both still gave the "metadata_connection_config is expected to be in type ..." error. Other things I tried:

from tfx.v1.orchestration.experimental import get_default_kubeflow_metadata_config, KubeflowDagRunner, KubeflowDagRunnerConfig
# Source code also mentions a couple "V2" classes, and the setup of them had to be tweaked in a couple obvious ways,
# but I eventually got the error "Cannot find KubeflowDagRunner.run() in kubeflow_dag_runner.py()
# so I assume these are not yet part of the public API
from tfx.v1.orchestration.experimental import get_default_kubeflow_metadata_config, KubeflowV2DagRunner, KubeflowV2DagRunnerConfig
# The following are all import errors
from tfx.orchestration.experimental import get_default_kubeflow_metadata_config
from tfx.orchestration.experimental.kubeflow import get_detault_kubeflow_metadata_config
from tfx.v1.orchestration.experimental.kubeflow import get_default_kubeflow_metadata_config, KubeflowDagRunner, KubeflowDagRunnerConfig
SMesser commented 3 years ago

I also tried replacing tfx.orchestration.pipeline.Pipeline with tfx.v1.dsl.Pipeline as per @1025KB 's suggestion. No joy.

ConverJens commented 3 years ago

I have used this approach since TFX 0.24 and it still works in TFX 1.2.0:

from tfx.orchestration.kubeflow import kubeflow_dag_runner
...
metadata_config = kubeflow_dag_runner.get_default_kubeflow_metadata_config()
# Default values when running KFP in KubeFlow
metadata_config.grpc_config.grpc_service_host.value = 'metadata-grpc-service.kubeflow'
metadata_config.grpc_config.grpc_service_port.value = '8080'

runner_config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
                kubeflow_metadata_config=metadata_config,
                tfx_image=<IMAGE>
                pipeline_operator_funcs=(...)
)

kubeflow_dag_runner.KubeflowDagRunner(config=runner_config).run(p.create_pipeline())
SMesser commented 3 years ago

@ConverJens How are you running the pipeline? I've been using a Bash script to set a bunch of environment variables and combine the "tfx pipeline create" and "tfx run create" commands from the TFX CLI. I've tried a bunch of alternate ways of setting up the pipeline itself, and can in some cases exchange the metadata-config-is-wrong-type error for a missing-required-arguments error, but both fail at container launch no matter what other changes I've made to the code. I've tried your customization of metadata config, alternate import statements for the Pipeline, TrainArgs, EvalArgs, SplitConfig, KubeflowDagRunner, KubeflowDagRunnerConfig and other classes / functions, and yet the only working example I've found was running on another VM and bypassing TFX CLI. I've made attempts in two different GCP environments. I've tried a couple of TFX's examples but none of those have worked for me either, despite the pipeline working fine under TFX 0.26.0.

My invocation is

tfx pipeline create --pipeline_path kubeflow_dag_runner.py --endpoint deadbeef0123456-hash-east1.pipelines.googleusercontent.com --build_image --engine kubeflow && tfx run create --pipeline_name test_run_666 --engine kubeflow --endpoint deadbeef0123456-hash-east1.pipelines.googleusercontent.com
ConverJens commented 3 years ago

@SMesser That's probably the reason: I don't use the TFX CLI, I never liked it. Instead, I have automated the compile and upload to KubeFlow in our CI/CD flow along with building a base image to use for the pipeline. Once the pipeline is available in KF, I usually trigger it remotely by calling the KFP API via REST.

Other than that, I have no idea as to why it doesn't work for you. Can you post the error message you get? It seems as you are running in GCP, right? Maybe your metadata grpc service has another path/port?

SMesser commented 3 years ago

When I do a slightly-modified version of the bigquery_ml example, I get this error. Note this is the entire log from the Kubeflow UI. This appears at each component if I've got multiple independent initial components:

2021-08-24 20:56:19.571778: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main

    "__main__", mod_spec)

  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code

    exec(code, run_globals)

  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 470, in <module>

    main()

 File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 428, in main

    deployment_config = runner_utils.extract_local_deployment_config(tfx_ir)

  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/local/runner_utils.py", line 39, in extract_local_deployment_config

    return _to_local_deployment(result)

  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/local/runner_utils.py", line 99, in _to_local_deployment

    input_config.metadata_connection_config.type_url))

ValueError: metadata_connection_config is expected to be in type ml_metadata.ConnectionConfig, but got type type.googleapis.com/tfx.orchestration.kubeflow.proto.KubeflowMetadataConfig

That's the error message which got me involved in this issue. I've gotten other error messages under different conditions, but am only now trying alternate ways of running the pipeline. (Nothing explicitly says it's the CLI itself which is failing, so I just assumed my code was bad...) The following are less relevant to the specific error message, but maybe they're relevant to debugging the CLI or identifying oddities in my setup.

If I go for a direct implementation of the bigquery_ml example, instead of making changes to reference our DB and file structure for things like the preprocessing_fn(), I get this instead:


2WARNING:absl:metadata_connection_config is not provided by IR.

3INFO:absl:tensorflow_ranking is not available: No module named 'tensorflow_ranking'

4INFO:absl:tensorflow_text is not available: No module named 'tensorflow_text'

5INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

6INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

7INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

8INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

9INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

10INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

11INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

12INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

13INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

14INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

15INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

16INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

17INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

18INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

19INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

20INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

21INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

22INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

23INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

24INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

25INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

26INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

27INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

28INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

29INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

30INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

31INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

32INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

33INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

34INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

35INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

36INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

37INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

38INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

39INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

40INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

41INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

42INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

43INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

44INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

45INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

46INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

47INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

48INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

49INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

50INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

51INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

52INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

53INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

54INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

55INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

56INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

57INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

58INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

59INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

60INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

61INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

62INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

63INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

64INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

65INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

66INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

67INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

68INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

69INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

70INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

71INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

72INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

73INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

74INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

75INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

76INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.MutableMapping[str, typing.Any]

77INFO:absl:tensorflow_text is not available: No module named 'tensorflow_text'

78INFO:root:Component BigQueryExampleGen is running.

79INFO:absl:Running launcher for node_info {

80  type {

81    name: "tfx.extensions.google_cloud_big_query.example_gen.component.BigQueryExampleGen"

82  }

83  id: "BigQueryExampleGen"

84}

85contexts {

86  contexts {

87    type {

88      name: "pipeline"

89    }

90    name {

91      field_value {

92        string_value: "chicago_taxi_pipeline_kubeflow_gcp"

93      }

94    }

95  }

96  contexts {

97    type {

98      name: "pipeline_run"

99    }

100    name {

101      field_value {

102        string_value: "chicago-taxi-pipeline-kubeflow-gcp-p4pfw"

103      }

104    }

105  }

106  contexts {

107    type {

108      name: "node"

109    }

110    name {

111      field_value {

112        string_value: "chicago_taxi_pipeline_kubeflow_gcp.BigQueryExampleGen"

113      }

114    }

115  }

116}

117outputs {

118  outputs {

119    key: "examples"

120    value {

121      artifact_spec {

122        type {

123          name: "Examples"

124          properties {

125            key: "span"

126            value: INT

127          }

128          properties {

129            key: "split_names"

130            value: STRING

131          }

132          properties {

133            key: "version"

134            value: INT

135          }

136        }

137      }

138    }

139  }

140}

141parameters {

142  parameters {

143    key: "input_config"

144    value {

145      field_value {

146        string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"single_split\",\n      \"pattern\": \"\\n         SELECT\\n           IFNULL(pickup_community_area, 0) as pickup_community_area,\\n           fare,\\n           EXTRACT(MONTH FROM trip_start_timestamp) AS trip_start_month,\\n           EXTRACT(HOUR FROM trip_start_timestamp) AS trip_start_hour,\\n           EXTRACT(DAYOFWEEK FROM trip_start_timestamp) AS trip_start_day,\\n           UNIX_SECONDS(trip_start_timestamp) AS trip_start_timestamp,\\n           IFNULL(pickup_latitude, 0) as pickup_latitude,\\n           IFNULL(pickup_longitude, 0) as pickup_longitude,\\n           IFNULL(dropoff_latitude, 0) as dropoff_latitude,\\n           IFNULL(dropoff_longitude, 0) as dropoff_longitude,\\n           trip_miles,\\n           IFNULL(pickup_census_tract, 0) as pickup_census_tract,\\n           IFNULL(dropoff_census_tract, 0) as dropoff_census_tract,\\n           payment_type,\\n           IFNULL(company, \'NA\') as company,\\n           IFNULL(trip_seconds, 0) as trip_seconds,\\n           IFNULL(dropoff_community_area, 0) as dropoff_community_area,\\n           tips\\n         FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`\\n         WHERE (ABS(FARM_FINGERPRINT(unique_key)) / 0x7FFFFFFFFFFFFFFF)\\n           < 0.001\"\n    }\n  ]\n}"

147      }

148    }

149  }

150  parameters {

151    key: "output_config"

152    value {

153      field_value {

154        string_value: "{\n  \"split_config\": {\n    \"splits\": [\n      {\n        \"hash_buckets\": 2,\n        \"name\": \"train\"\n      },\n      {\n        \"hash_buckets\": 1,\n        \"name\": \"eval\"\n      }\n    ]\n  }\n}"

155      }

156    }

157  }

158  parameters {

159    key: "output_data_format"

160    value {

161      field_value {

162        int_value: 6

163      }

164    }

165  }

166}

167downstream_nodes: "Evaluator"

168downstream_nodes: "ModelValidator"

169downstream_nodes: "StatisticsGen"

170downstream_nodes: "Transform"

171execution_options {

172  caching_options {

173  }

174}

175

176INFO:absl:MetadataStore with gRPC connection initialized

177WARNING:absl:Conflicting properties comparing with existing metadata type with the same type name. Existing type: id: 2

178name: "pipeline"

179properties {

180  key: "pipeline_name"

181  value: STRING

182Traceback (most recent call last):

183  File "/opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 199, in _call_method

184    response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec))

185  File "/opt/conda/lib/python3.7/site-packages/grpc/_channel.py", line 923, in __call__

186}

187, New type: name: "pipeline"

188

189    return _end_unary_response_blocking(state, call, False, None)

190  File "/opt/conda/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking

191    raise _InactiveRpcError(state)

192grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

193    status = StatusCode.ALREADY_EXISTS

194    details = "Type already exists with different properties."

195    debug_error_string = "{"created":"@1629927950.896293052","description":"Error received from peer ipv4:10.83.251.115:8080","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Type already exists with different properties.","grpc_status":6}"

196>

197

198During handling of the above exception, another exception occurred:

199

200Traceback (most recent call last):

201  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/mlmd/common_utils.py", line 76, in register_type_if_not_exist

202    metadata_type, can_add_fields=True, can_omit_fields=True)

203  File "/opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 506, in put_context_type

204    self._call('PutContextType', request, response)

205  File "/opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 174, in _call

206    return self._call_method(method_name, request, response)

207  File "/opt/conda/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 204, in _call_method

208    raise _make_exception(e.details(), e.code().value[0])  # pytype: disable=attribute-error

209ml_metadata.errors.AlreadyExistsError: Type already exists with different properties.

210

211During handling of the above exception, another exception occurred:

212

213Traceback (most recent call last):

214  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main

215    "__main__", mod_spec)

216  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code

217    exec(code, run_globals)

218  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 470, in <module>

219    main()

220  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 462, in main

221    execution_info = component_launcher.launch()

222  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 465, in launch

223    execution_preparation_result = self._prepare_execution()

224  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/launcher.py", line 251, in _prepare_execution

225    metadata_handler=m, node_contexts=self._pipeline_node.contexts)

226  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 153, in prepare_contexts

227    for context_spec in node_contexts.contexts

228  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 153, in <listcomp>

229    for context_spec in node_contexts.contexts

230  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 91, in _register_context_if_not_exist

231    metadata_handler=metadata_handler, context_spec=context_spec)

232  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/mlmd/context_lib.py", line 47, in _generate_context_proto

233    metadata_handler, context_spec.type)

234  File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/portable/mlmd/common_utils.py", line 89, in register_type_if_not_exist

235    raise RuntimeError(warning_str)

236RuntimeError: Conflicting properties comparing with existing metadata type with the same type name. Existing type: id: 2

237name: "pipeline"

238properties {

239  key: "pipeline_name"

240  value: STRING

241}

242, New type: name: "pipeline"

If I run the code we had working in TFX 0.26, I get the

2021-08-26 17:17:47.330922: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1

2usage: container_entrypoint.py [-h] --pipeline_name PIPELINE_NAME

3                               --pipeline_root PIPELINE_ROOT

4                               --kubeflow_metadata_config

5                               KUBEFLOW_METADATA_CONFIG --beam_pipeline_args

6                               BEAM_PIPELINE_ARGS --additional_pipeline_args

7                               ADDITIONAL_PIPELINE_ARGS

8                               --component_launcher_class_path

9                               COMPONENT_LAUNCHER_CLASS_PATH [--enable_cache]

10                               --serialized_component SERIALIZED_COMPONENT

11                               --component_config COMPONENT_CONFIG

12container_entrypoint.py: error: the following arguments are required: --pipeline_name, --beam_pipeline_args, --additional_pipeline_args, --component_launcher_class_path, --component_config

The last was an outgrowth of #2180 , and my post announcing victory there seems to have been premature - I got one working run, then it failed a few days later with no changes to the code or input parameters... so at this point I'm not even sure if I was doing a proper test of TFX 1.0 or if I ran a 0.26 version of the code by mistake, or if some versioning changed with what we have deployed for GCP / Kubeflow / ...

These errors have persisted through a lot of changes, though I can't yet account for why I get different errors in different conditions. Other contextual bits I'm having trouble controlling for or avoiding: I'm running from within a GCP AI Platform notebook. I'm using Poetry to control Python dependencies.

ConverJens commented 3 years ago

@SMesser I have never experienced the first error myself but from the example pipeline posted by @1025KB , I can see a difference compared to my code. The posted example uses:

tfx.dsl.Pipeline(...

while I use:

...
pipeline.Pipeline(...

to create the pipeline object that is later compiled. Apart from that, I have no idea.

The second issue on the other hand has to do with conflicting entries in MLMD which can occur for several reasons:

  1. you have renamed a runtime parameter or input to your component after it has already been executed once
  2. you have run your pipeline using TFX < v1.0.0 (I think tfx==1.0.0rc1 was also ok) and then (at least) once with TFX >= 1.0.0.

I recently did 2. and ended up with that error message. It appears that there has been some breaking change in the MLMD schema versions happening with the 1.0.0 release, causing rollback to prior versions to fail. I cleaned my MLMD DB and re-run the pre 1.0.0 version and it worked again.

If you are running from within a notebook, I would first try the components using the interactive runner which just runs them in the notebook. Default is using a sqlite DB but you can pass a MLMD config and access the real thing. This would allow you to run small BigQuery data sets until the components work as expected and then move onto the TFX CLI.

This is the way I have excluded bad code on my own side when the issue was really in KFP or TFXs kubeflow specific code. You might also want to try the local or beam runner.

SMesser commented 3 years ago

I think I've tried all those things except running from something other than the TFX CLI... working on that now.

grofte commented 3 years ago

The Chicago Taxi example runs fine in the Colab notebook with TFX 1.2.0 but the code also seems to be updated whereas the examples here still have code like from tfx.proto import pusher_pb2. So I think you might be able to reverse-engineer a working example from the interactive notebook.

zhitaoli commented 2 years ago

@1025KB @rcrowe-google Can you help to coordinate relevant imports?

rcrowe-google commented 2 years ago

Given that the last comment is over a year old I'm closing this bug.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No