kubeflow / metadata

Repository for assets related to Metadata.
Apache License 2.0
121 stars 69 forks source link

reproducible initial metadata server connection error #114

Open amygdala opened 5 years ago

amygdala commented 5 years ago

/kind bug

KF 0.6.1, GKE, using IAP

Running the example notebook , I've seen the following error each time.

Re-running the cell fixes things, so maybe a retry is needed.

With this code:

exec = metadata.Execution(
    name = "execution" + datetime.utcnow().isoformat("T") ,
    workspace=ws1,
    run=r,
    description="execution example",
)
print("An execution is create with id %s" % exec.id)

I initially get this error (again, a rerun of the cell lets it go through):

ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

Here's the full trace: https://gist.github.com/amygdala/19670fcf32c1369c03d4125e86db822b

zhenghuiwang commented 5 years ago

I think https://github.com/kubeflow/examples/pull/621 should fix it.

It is likely to be the istio side car hasn't started

https://github.com/kubeflow/examples/pull/621/files#diff-19caa32109d22abdba8778f600e00f72R342

amygdala commented 5 years ago

I also saw this error when the cluster had been sitting idle for a few days. This seems like a very common error. Would it make sense to have the client itself catch the error and retry, rather than requiring error mgmt in the user code?

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/usr/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.6/http/client.py", line 266, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
jtfogarty commented 4 years ago

/area engprod /priority p1