kubeflow / metadata

Repository for assets related to Metadata.
Apache License 2.0
121 stars 69 forks source link

No type found for query: kubeflow.org/alpha/execution #235

Closed montenegrodr closed 2 years ago

montenegrodr commented 3 years ago

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.]

Created a step in my kubeflow pipeline only for persisting metadata. But the step fails when trying to create an Execution with:

tensorflow.python.framework.errors_impl.NotFoundError: No type found for query: kubeflow.org/alpha/execution

Full stack:

2020-08-18 11:12:36.533692: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-08-18 11:12:36.533919: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ERROR:absl:mlmd client NotFoundError: No type found for query: kubeflow.org/alpha/execution
ERROR:absl:mlmd client NotFoundError: No type found for query: kubeflow.org/alpha/execution
ERROR:absl:mlmd client NotFoundError: No type found for query: kubeflow.org/alpha/execution
ERROR:absl:mlmd client NotFoundError: No type found for query: kubeflow.org/alpha/execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 157, in _call_method
    response.CopyFrom(grpc_method(request))
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.NOT_FOUND
    details = "No type found for query: kubeflow.org/alpha/execution"
    debug_error_string = "{"created":"@1597749167.874007111","description":"Error received from peer ipv4:10.11.249.115:8080","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"No type found for query: kubeflow.org/alpha/execution","grpc_status":5}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/app/run.py", line 45, in <module>
    work()
  File "/app/run.py", line 28, in work
    description="execution example",
  File "/usr/local/lib/python3.7/site-packages/kubeflow/metadata/metadata.py", line 284, in __init__
    self._type_id = _retry(lambda: self.workspace.store.get_execution_type(
  File "/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/retrying.py", line 212, in call
    raise attempt.get()
  File "/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/local/lib/python3.7/site-packages/kubeflow/metadata/metadata.py", line 756, in _retry
    return f()
  File "/usr/local/lib/python3.7/site-packages/kubeflow/metadata/metadata.py", line 285, in <lambda>
    Execution.EXECUTION_TYPE_NAME).id)
  File "/usr/local/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 630, in get_execution_type
    self._call('GetExecutionType', request, response)
  File "/usr/local/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 131, in _call
    return self._call_method(method_name, request, response)
  File "/usr/local/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py", line 162, in _call_method
    raise _make_exception(e.details(), e.code().value[0])
tensorflow.python.framework.errors_impl.NotFoundError: No type found for query: kubeflow.org/alpha/execution

Pipeline Step:

from uuid import uuid4
from kubeflow.metadata import metadata
from datetime import datetime

METADATA_STORE_HOST = "metadata-grpc-service.kubeflow"
METADATA_STORE_PORT = 8080

def work():
    ws1 = metadata.Workspace(
        # Connect to metadata service in namespace kubeflow in k8s cluster.
        store=metadata.Store(grpc_host=METADATA_STORE_HOST, grpc_port=METADATA_STORE_PORT),
        name="workspace_1",
        description="a workspace for testing",
        labels={"n1": "v1"})

    r = metadata.Run(
        workspace=ws1,
        name="run-" + datetime.utcnow().isoformat("T"),
        description="a run in ws_1",
    )

    exec = metadata.Execution(
        name="execution" + datetime.utcnow().isoformat("T"),
        workspace=ws1,
        run=r,
        description="execution example",
    )
    print("An execution was created with id %s" % exec.id)

    date_set_version = "data_set_version_" + str(uuid4())
    data_set = exec.log_input(
        metadata.DataSet(
            description="an example data",
            name="mytable-dump",
            owner="owner@my-company.org",
            uri="file://path/to/dataset",
            version=date_set_version,
            query="SELECT * FROM mytable"))
    print("Data set id is {0.id} with version '{0.version}'".format(data_set))

if __name__ == "__main__":
    work()

What did you expect to happen:

Create an execution and store the metadata.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Running in a GCP managed service.

Environment:

kubeflow-metadata==0.3.1
Version: 1.0.0
issue-label-bot[bot] commented 3 years ago

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

OrrAvrech commented 3 years ago

Any updates? Got the same error on a similar code, after accessing the metadata-grpc-deployment through an external IP. Kubeflow was deployed using GCP's UI.

souvik-deepsource commented 3 years ago

@jlewi Any update on this? This is breaking for the demo notebook.

harshit-deepsource commented 3 years ago

@jlewi Same with me.

@montenegrodr did you find any workaround?

montenegrodr commented 3 years ago

@jlewi Same with me.

@montenegrodr did you find any workaround?

no

souvik-deepsource commented 3 years ago

@montenegrodr It's due to this issue.

harshit-deepsource commented 3 years ago

@montenegrodr

Try using Kubeflow v1.1.0 and it should work just fine. It worked for me.

anupash147 commented 3 years ago

am using kubeflow v1.2.0 and still seeing the issue,,

umka1332 commented 3 years ago

AFAIK Kubeflow metadata is deleted from Kubeflow v1.2.0 in favor of ML metadata since the first one is abandoned and the second one is supported by KFP. Please see: https://github.com/kubeflow/metadata/pull/242#issuecomment-716343533 https://github.com/kubeflow/metadata/issues/250 https://github.com/kubeflow/metadata/issues/225 Please correct me if Kubeflow metadata should work with Kubeflow v1.2.0.

neuromage commented 3 years ago

AFAIK Kubeflow metadata is deleted from Kubeflow v1.2.0 in favor of ML metadata since the first one is abandoned and the second one is supported by KFP.

That is correct @umka1332. The SDK in this repo is unmaintained, and so we decided to move to MLMD gRPC service for now.