Allow using own docker registry for datawire/telepresence-k8s:x.xxx docker image (due to Docker rate limit)

attila123 commented 3 years ago

What were you trying to do?

I did not use telepresence for some time and now was trying to use telepresence. But its docker image (datawire/telepresence-k8s:0.108) could not be downloaded, I think due to the rate limit introduced by Docker on November 20, 2020: https://www.docker.com/increase-rate-limits (which mainly affects big companies I think)

This may be related to my environment: There was a successful telepresence session somehow (maybe I caught the small window of opportunity to download its docker image), but the next attempt I got this problem again. Which makes me wonder whether it tries to download the image again? (Or maybe it went out of the docker cache in our test k8s server (it could be very well the case, I noticed it sometimes in our environment).) Note: I don't have docker access to our test k8s environment (so that e.g. I could pull the image from an own docker registry and tag it to the required name). (I needed to request some permissions to be able to use telepresence with our test k8s environment.)

What did you expect to happen?

I would like to have a command line option to specify the telepresence image "full path" e.g. our.own.docker.registry/whatever/path/datawire/telepresence-k8s:0.108. Should this command line option exist (I did not find it), it should be "advertised" to the user when getting an error like this.

What happened instead?

(please tell us - the traceback is automatically included, see below. use https://gist.github.com to pass along full telepresence.log)

Workaround for users

# One-time setup
# Create deployment
$ TELEPRESENCE_USE_DEPLOYMENT=1 telepresence --verbose --new-deployment telepresence-use-this-deployment
# From another terminal dump its definition
kubectl get deploy telepresence-use-this-deployment -o yaml > telepresence-deployment.yaml
# Edit docker image in this yaml file according to where you host the telepresence docker image
# Exit telepresence, it will clean up the deployment
# Recreate the deployment:
kubectl apply -f telepresence-deployment.yaml

# From now on use:
TELEPRESENCE_USE_DEPLOYMENT=1 telepresence --verbose --deployment telepresence-use-this-deployment

Automatically included information

Command line: ['/usr/sbin/telepresence', '--verbose', '--namespace', 'daas'] Version: 0.108 Python version: 3.9.0 (default, Oct 7 2020, 23:09:01) [GCC 10.2.0] kubectl version: Client Version: v1.19.4 // Server Version: v1.14.5 oc version: (error: [Errno 2] No such file or directory: 'oc') OS: Linux hp-g1 5.9.11-arch2-1 #1 SMP PREEMPT Sat, 28 Nov 2020 02:07:22 +0000 x86_64 GNU/Linux

Traceback (most recent call last):
  File "/usr/sbin/telepresence/telepresence/cli.py", line 135, in crash_reporting
    yield
  File "/usr/sbin/telepresence/telepresence/main.py", line 65, in main
    remote_info = start_proxy(runner)
  File "/usr/sbin/telepresence/telepresence/proxy/operation.py", line 135, in act
    wait_for_pod(runner, self.remote_info)
  File "/usr/sbin/telepresence/telepresence/proxy/remote.py", line 140, in wait_for_pod
    raise RuntimeError(
RuntimeError: Pod isn't starting or can't be found: {'conditions': [{'lastProbeTime': None, 'lastTransitionTime': '2020-12-04T07:55:56Z', 'status': 'True', 'type': 'Initialized'}, {'lastProbeTime': None, 'lastTransitionTime': '2020-12-04T07:55:56Z', 'message': 'containers with unready status: [telepresence]', 'reason': 'ContainersNotReady', 'status': 'False', 'type': 'Ready'}, {'lastProbeTime': None, 'lastTransitionTime': '2020-12-04T07:55:56Z', 'message': 'containers with unready status: [telepresence]', 'reason': 'ContainersNotReady', 'status': 'False', 'type': 'ContainersReady'}, {'lastProbeTime': None, 'lastTransitionTime': '2020-12-04T07:56:33Z', 'status': 'True', 'type': 'PodScheduled'}], 'containerStatuses': [{'image': 'datawire/telepresence-k8s:0.108', 'imageID': '', 'lastState': {}, 'name': 'telepresence', 'ready': False, 'restartCount': 0, 'state': {'waiting': {'message': 'Back-off pulling image "datawire/telepresence-k8s:0.108"', 'reason': 'ImagePullBackOff'}}}], 'hostIP': '10.20.48.33', 'phase': 'Pending', 'podIP': '10.2.3.126', 'qosClass': 'Burstable', 'startTime': '2020-12-04T07:55:56Z'}

Logs:

ence-k8s:0.108",
 325.0 200 |                 "imageID": "",
 325.0 200 |                 "lastState": {},
 325.0 200 |                 "name": "telepresence",
 325.0 200 |                 "ready": false,
 325.0 200 |                 "restartCount": 0,
 325.0 200 |                 "state": {
 325.0 200 |                     "waiting": {
 325.0 200 |                         "message": "Back-off pulling image \"datawire/telepresence-k8s:0.108\"",
 325.0 200 |                         "reason": "ImagePullBackOff"
 325.0 200 |                     }
 325.0 200 |                 }
 325.0 200 |             }
 325.0 200 |         ],
 325.0 200 |         "hostIP": "HIDDEN by me",
 325.0 200 |         "phase": "Pending",
 325.0 200 |         "podIP": "HIDDEN by me",
 325.0 200 |         "qosClass": "Burstable",
 325.0 200 |         "startTime": "2020-12-04T07:55:56Z"
 325.0 200 |     }
 325.0 200 | }
 325.1 TEL | [200] captured in 0.43 secs.
 325.1 TEL | END SPAN remote.py:109(wait_for_pod)  180.6s

ark3 commented 3 years ago

The quick workaround is to set the TELEPRESENCE_REGISTRY environment variable, which defaults to datawire. In your example above, you'd set it to our.own.docker.registry/whatever/path/datawire.

Let's keep this issue around until we have a better error message.

GinoPane commented 3 years ago

Any updates on this? Workaround works fine though. Should we consider the workaround an "official" way of handling this?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 7 days with no activity.

telepresenceio / telepresence