Closed billytrend-cohere closed 4 days ago
Telepresence requires NET_ADMIN
capability and access to the /dev/net/tun
device of the host that runs the codespace container, or it will fail to configure its virtual network device. Have you been able to configure that?
Hi @thallgren, thanks for your reply! I have taken a look and am still hitting the same issue. Are there some logs maybe that would reveal the underlying issue?
I have modified the codespaces with the following:
"build": { "dockerfile": "../Dockerfile" },
"runArgs": [
"--privileged",
"--cap-add=NET_ADMIN",
"--device=/dev/net/tun"
],
I also tried sudo setcap cap_net_admin+ep /usr/local/bin/telepresence
capsh --print
appears to show cap_net_admin
:
WARNING: libcap needs an update (cap=40 should have a name).
Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39,40
Ambient set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=1000(codespace) euid=1000(codespace)
gid=1000(codespace)
groups=106(ssh),107(docker),989(pipx),990(python),991(oryx),992(golang),993(sdkman),994(rvm),995(php),996(conda),997(nvs),998(nvm),999(hugo),1000(codespace)
Guessed mode: UNCERTAIN (0)
You'll find the telepresence logs under ~/.cache/telepresence/logs. The ones of interest here are connector.log and daemon.log. They will become even more interesting if you turn on debugging by adding the following to ~/.config/telepresence/config.yml, then do telepresence quit -s
and then retry the telepresence connect
.
logLevels:
userDaemon: debug
rootDaemon: debug
Also, if possible, please try version 2.20.2
. It contains several bugfixes that might affect the connect behavior.
With those settings, I do not see a daemon.log
. cli.log
is empty and connector.log
shows the following:
Here is the log
2024-11-13 22:44:13.6774 debug connector/server-grpc/conn=2/Quit-2 : called
2024-11-13 22:44:13.6775 debug connector/session : goroutine "/connector/session" exited
2024-11-13 22:44:13.6775 debug connector/server-grpc/conn=2/Quit-2 : returned
2024-11-13 22:44:13.6775 info connector:shutdown_logger : shutting down (gracefully)...
2024-11-13 22:44:13.6775 debug connector/background-metriton : goroutine "/connector/background-metriton" exited
2024-11-13 22:44:13.6775 debug connector/service : goroutine "/connector/service" exited
2024-11-13 22:44:13.6779 debug connector/server-grpc : gRPC server ended
2024-11-13 22:44:13.6780 debug connector/server-grpc : goroutine "/connector/server-grpc" exited
2024-11-13 22:44:13.6945 debug connector/config-reload : goroutine "/connector/config-reload" exited
2024-11-13 22:44:19.1720 info Starting socket listener for /tmp/telepresence-connector.socket
2024-11-13 22:44:19.1721 debug Listener opened on /tmp/telepresence-connector.socket
2024-11-13 22:44:19.1722 info ---
2024-11-13 22:44:19.1722 info Telepresence Connector v2.19.1 (api v3) starting...
2024-11-13 22:44:19.1722 info PID is 8233
2024-11-13 22:44:19.1722 info
2024-11-13 22:44:19.1728 info connector/server-grpc : gRPC server started
2024-11-13 22:44:19.2051 debug connector/server-grpc/conn=1/Connect-1 : called
2024-11-13 22:44:19.2065 debug connector/session : using namespace "default"
2024-11-13 22:44:19.2066 info connector/session : -- Starting new session
2024-11-13 22:44:19.2067 info connector/session : Connecting to k8s cluster...
2024-11-13 22:44:19.4055 info connector/session : Server version v1.30.5-gke.1014003
2024-11-13 22:44:19.4055 info connector/session : Context: gke_cohere-staging_us-central1_staging
2024-11-13 22:44:19.4055 info connector/session : Server: https://34.134.248.136
2024-11-13 22:44:21.1411 info connector/session : Will look for traffic manager in namespace ambassador
2024-11-13 22:44:21.1411 info connector/session : Connected to context gke_cohere-staging_us-central1_staging, namespace default (https://34.134.248.136)
2024-11-13 22:44:21.2272 info connector/session : Connecting to traffic manager...
2024-11-13 22:44:21.2273 debug connector/session : checking that traffic-manager exists
2024-11-13 22:44:21.2854 debug connector/session : creating port-forward
2024-11-13 22:44:21.4406 debug connector/session : k8sPortForwardDialer.dial(ctx, Pod./traffic-manager-68548bff4b-4976v.ambassador, 8081)
2024-11-13 22:44:21.4407 debug connector/session : k8sPortForwardDialer.spdyDial(ctx, Pod./traffic-manager-68548bff4b-4976v.ambassador)
2024-11-13 22:44:21.9049 info connector/session : Connected to Traffic Manager v2.19.6
2024-11-13 22:44:21.9601 debug connector/session : traffic-manager port-forward established, client was already known to the traffic-manager as "codespace@codespaces-a36db9"
2024-11-13 22:44:22.0170 debug connector/session : Applying client configuration from cluster
2024-11-13 22:44:22.0170 debug connector/session : cluster:
2024-11-13 22:44:22.0170 debug connector/session : mappedNamespaces:
2024-11-13 22:44:22.0170 debug connector/session : - ambassador
2024-11-13 22:44:22.0170 debug connector/session : - blobheart
2024-11-13 22:44:22.0171 debug connector/session : - bh-finetuning
2024-11-13 22:44:22.0171 debug connector/session : - bh-private-models
2024-11-13 22:44:22.0171 debug connector/session : - bh-private-models-evaluation
2024-11-13 22:44:22.0172 info connector/session : Configuration reloaded
2024-11-13 22:44:22.8895 debug connector/server-grpc/conn=1/Connect-1 : returned
2024-11-13 22:44:22.8895 info connector/session:shutdown_logger : shutting down (gracefully)...
2024-11-13 22:44:22.8896 debug connector/session/info-kicker-gke_cohere-staging_us-central1_staging-default-cn : Deleting daemon info gke_cohere-staging_us-central1_staging-default-cn.json because context was cancelled
2024-11-13 22:44:22.8896 debug connector/session/info-watcher-gke_cohere-staging_us-central1_staging-default-cn : goroutine "/connector/session/info-watcher-gke_cohere-staging_us-central1_staging-default-cn" exited
2024-11-13 22:44:22.8899 debug connector/session/info-kicker-gke_cohere-staging_us-central1_staging-default-cn : goroutine "/connector/session/info-kicker-gke_cohere-staging_us-central1_staging-default-cn" exited
Is it possible that the root daemon is never run? I guess it's odd that we don't see a log for it. Also, I see that it maybe does not get run in "docker mode": https://github.com/telepresenceio/telepresence/blob/fce335576845968028964a807c559f1682279f49/pkg/client/cli/connect/daemon.go#L60
Alsooo telepresence connect
only logs Launching Telepresence User Daemon
. Should it also be logging that it is launching the root daemon?
Can i explicitly run the root daemon?
tysm for your support with this
Actually i'm possibly misunderstanding docker mode. That seems to be activated by a cli flag.
It is still weird though that we never see this log during connect:
Sorry. I should have mentioned that in this configuration, the root daemon will run embedded in the connector process, so there will be no daemon.log.
I just tried running 2.20.2
and I still see rot daemon is not running
unfortunately.
I built a version with this removed:
And I now see the Launching Telepresence Root Daemon
:
Telepresence Daemons quitting...done
Launching Telepresence User Daemon
Launching Telepresence Root Daemon
telepresence connect: error: connector.Connect: subnet 10.0.0.0/17 overlaps with existing route "10.0.0.0/16 via 10.0.0.70 dev eth0". Please see https://www.getambassador.io/docs/telepresence/latest/reference/vpn for more information
This feels like progress? or have i just broken it?
hmm added --allow-conflicting-subnets 10.0.0.0/16
to the command and it all seems to be up and healthy but doing some testing
This is interesting. Looks like you found the culprit!
As I wrote earlier, the root daemon is not supposed to run in this configuration (i.e. when the user daemon is running in a container), instead it will be embedded into the user daemon. However, that's when the user-daemon is running as root, and that doesn't seem to be the case here, and as a result, this code doesn't execute. The user daemon, not receiving the --embed-network
flag, then assumes that the root daemon is already running.
So, instead of just removing the !daemon.GetUserClient(rc).Containerized()
, you should amend it with a && os.Getuid() == 0
, e.g.
if err == nil && required && !(os.Getuid() == 0 && daemon.GetUserClient(rc).Containerized())
Ty for confirming, have opened a pr!
@billytrend-cohere I've released a 2.20.3-rc.1 version of Telepresence with the codespaces fixes included. Can you please give it a try and report back?
Released in version 2.20.3
@thallgren sorry to reopen. I'm seeing behaviour where my urls are available in the codespace terminal but not in the k3d cluster that I'm running. I do see this warning:
You are using the OSS client v2.20.3-51-gfce335576-YvDDgqkcZvPrPIYWs18S2A to connect to an enterprise traffic manager v2.19.6. Please consider installing an
enterprise client from [getambassador.io](http://getambassador.io/), or use "telepresence helm install" to install an OSS traffic-manager
I wonder if this is causing the problem. Am I able to use the non oss cli with the fix we have merged?
That fix will probably be included when Ambassador Labs makes their next release., but you should be able to use the OSS client until then, if you can live with the warning.
Not sure what you mean by "urls are available in the codespace terminal", but unless it's the exact same problem that you reported earlier, I'd suggest you create a new ticket.
Describe the bug
I'm trying to run telepresence in a codespace. But I see an error. The
connector.log
containers no clues:To Reproduce
When I run
I get
telepresence connect: error: connector.Connect: rot daemon is not running
(rot should be root)Expected behavior
When I am running locally this connects my cluster to the context
Versions (please complete the following information):
Output of
telepresence version
Operating system of workstation running
telepresence
commandscodespace using
mcr.microsoft.com/devcontainers/universal:2
container