Closed huifanglu2018 closed 3 years ago
Can you share the content of the /etc/teleport.yaml
files for both nodes?
teleport:
data_dir: /var/lib/teleport
auth_token: f7adb7ccdf04037bcd2b52ec6010fd6f0caec94ba190b765
auth_servers:
- 10.10.10.136:3025
connection_limits:
max_connections: 1000
max_users: 250
log:
output: stderr
severity: DEBUG
auth_service:
enabled: true
# session_recording: "proxy"
proxy_checks_host_keys: no
cluster_name: "teleportcluster"
listen_addr: 0.0.0.0:3025
tokens:
- proxy,node:f7adb7ccdf04037bcd2b52ec6010fd6f0caec94ba190b765
authentication:
type: local
second_factor: off
ssh_service:
enabled: true
labels:
env: staging
proxy_service:
enabled: true
listen_addr: 0.0.0.0:3023
web_listen_addr: 0.0.0.0:3080
tunnel_listen_addr: 0.0.0.0:3024
public_addr: 10.10.10.136:3080
teleport:
data_dir: /var/lib/teleport
auth_token: f7adb7ccdf04037bcd2b52ec6010fd6f0caec94ba190b765
auth_servers:
- 10.10.10.136:3080
connection_limits:
max_connections: 1000
max_users: 250
log:
output: stderr
severity: DEBUG
auth_service:
enabled: false
ssh_service:
enabled: true
labels:
env: staging
proxy_service:
enabled: false
@webvictim Thanks for your help.
@huifanglu2018 I think the reason for the issue is because the token you've set is valid for both proxy,node
roles, but your node is only trying to join with the node
role. With Teleport, tokens must use the full set of roles.
Change - proxy,node:f7adb7ccdf04037bcd2b52ec6010fd6f0caec94ba190b765
to - node:f7adb7ccdf04037bcd2b52ec6010fd6f0caec94ba190b765
, then restart your Teleport auth server and try joining the node again - it should work.
/home/ubuntu/teleport# /usr/local/bin/teleport start --roles=node --config=/home/ubuntu/teleport/teleport.yaml --auth-server=https://10.254.9.135:**3080** --pid-file=/run/teleport.pid --insecure
DEBU [SQLITE] Connected to: file:/var/lib/teleport/proc/sqlite.db?_busy_timeout=10000&_sync=OFF, poll stream period: 1s lite/lite.go:173
DEBU [SQLITE] Synchronous: 0, busy timeout: 10000 lite/lite.go:218
DEBU [KEYGEN] SSH cert authority started with no keys pre-compute. native/native.go:107
DEBU [PROC] Adding service to supervisor. service:register.node service/supervisor.go:181
DEBU [PROC] Adding service to supervisor. service:ssh.node service/supervisor.go:181
DEBU [PROC] Adding service to supervisor. service:ssh.shutdown service/supervisor.go:181
DEBU [PROC] Adding service to supervisor. service:common.rotate service/supervisor.go:181
DEBU [PROC:1] Service has started. service:register.node service/supervisor.go:242
DEBU [PROC:1] Service has started. service:ssh.shutdown service/supervisor.go:242
DEBU [PROC:1] No signal pipe to import, must be first Teleport process. service/service.go:761
DEBU [PROC:1] Service has started. service:ssh.node service/supervisor.go:242
DEBU [PROC:1] Service has started. service:common.rotate service/supervisor.go:242
DEBU [PROC:1] Connected state: never updated. service/connect.go:99
INFO [PROC] Connecting to the cluster worker7 with TLS client certificate. service/connect.go:127
DEBU [PROC] Attempting to connect to Auth Server directly. service/connect.go:793
DEBU [PROC] Attempting to connect to Auth Server through tunnel. service/connect.go:801
DEBU [CLIENT] HTTPS client init(proxyAddr=10.254.9.135:3080, insecure=true) client/weblogin.go:307
WARNING: You are using insecure connection to SSH proxy https://10.254.9.135:3080
DEBU [PROC] Discovered address for reverse tunnel server: 10.254.9.135:3024. service/connect.go:881
DEBU [HTTP:PROX] No valid environment variables found. proxy/proxy.go:222
DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:137
ERRO [PROC:1] Node failed to establish connection to cluster: ssh: handshake failed: no matching keys found. time/sleep.go:148
@webvictim Thanks so much for your reply, but I was so sad it still so many issues. It still same error after I change it to node:f7adb7ccdf04037bcd2b52ec6010fd6f0caec94ba190b765 . and I find if I change the auth server to 10.254.9.135:3025 , it shows log as below: node log:
/home/ubuntu/teleport# /usr/local/bin/teleport start --roles=node --config=/home/ubuntu/teleport/teleport.yaml --auth-server=https://10.254.9.135:3025 --pid-file=/run/teleport.pid --insecure --insecure-no-tls
DEBU [SQLITE] Connected to: file:/var/lib/teleport/proc/sqlite.db?_busy_timeout=10000&_sync=OFF, poll stream period: 1s lite/lite.go:173
DEBU [SQLITE] Synchronous: 0, busy timeout: 10000 lite/lite.go:218
DEBU [KEYGEN] SSH cert authority started with no keys pre-compute. native/native.go:107
DEBU [PROC] Adding service to supervisor. service:register.node service/supervisor.go:181
DEBU [PROC] Adding service to supervisor. service:ssh.node service/supervisor.go:181
DEBU [PROC] Adding service to supervisor. service:ssh.shutdown service/supervisor.go:181
DEBU [PROC] Adding service to supervisor. service:common.rotate service/supervisor.go:181
DEBU [PROC:1] Service has started. service:ssh.shutdown service/supervisor.go:242
DEBU [PROC:1] Service has started. service:ssh.node service/supervisor.go:242
DEBU [PROC:1] Service has started. service:common.rotate service/supervisor.go:242
DEBU [PROC:1] No signal pipe to import, must be first Teleport process. service/service.go:761
DEBU [PROC:1] Service has started. service:register.node service/supervisor.go:242
DEBU [PROC:1] Connected state: never updated. service/connect.go:99
INFO [PROC] Connecting to the cluster worker7 with TLS client certificate. service/connect.go:127
DEBU [PROC] Attempting to connect to Auth Server directly. service/connect.go:793
DEBU [PROC] Attempting to connect to Auth Server through tunnel. service/connect.go:801
DEBU [CLIENT] HTTPS client init(proxyAddr=10.254.9.135:3025, insecure=true) client/weblogin.go:307
WARNING: You are using insecure connection to SSH proxy https://10.254.9.135:3025
ERRO [PROC:1] "Node failed to establish connection to cluster: 404 page not found\n." time/sleep.go:148
auth log:
auth log:
ERRO [AUTH:1] "Failed to retrieve client pool. Client cluster worker7, target cluster teleportcluster, error: \nERROR REPORT:\nOriginal Error: *trace.NotFoundError key \"/authorities/host/worker7\" is not found\nStack Trace:\n\t/go/src/github.com/gravitational/teleport/lib/backend/memory/memory.go:186 github.com/gravitational/teleport/lib/backend/memory.(*Memory).Get\n\t/go/src/github.com/gravitational/teleport/lib/backend/report.go:159 github.com/gravitational/teleport/lib/backend.(*Reporter).Get\n\t/go/src/github.com/gravitational/teleport/lib/backend/wrap.go:89 github.com/gravitational/teleport/lib/backend.(*Wrapper).Get\n\t/go/src/github.com/gravitational/teleport/lib/services/local/trust.go:207 github.com/gravitational/teleport/lib/services/local.(*CA).GetCertAuthority\n\t/go/src/github.com/gravitational/teleport/lib/cache/cache.go:892 github.com/gravitational/teleport/lib/cache.(*Cache).GetCertAuthority\n\t/go/src/github.com/gravitational/teleport/lib/auth/middleware.go:546 github.com/gravitational/teleport/lib/auth.ClientCertPool\n\t/go/src/github.com/gravitational/teleport/lib/auth/middleware.go:253 github.com/gravitational/teleport/lib/auth.(*TLSServer).GetConfigForClient\n\t/opt/go/src/crypto/tls/handshake_server.go:141 crypto/tls.(*Conn).readClientHello\n\t/opt/go/src/crypto/tls/handshake_server.go:40 crypto/tls.(*Conn).serverHandshake\n\t/opt/go/src/crypto/tls/conn.go:1362 crypto/tls.(*Conn).Handshake\n\t/go/src/github.com/gravitational/teleport/lib/multiplexer/tls.go:141 github.com/gravitational/teleport/lib/multiplexer.(*TLSListener).detectAndForward\n\t/opt/go/src/runtime/asm_amd64.s:1375 runtime.goexit\nUser Message: key \"/authorities/host/worker7\" is not found\n." auth/middleware.go:261
WARN [MXTLS:1] Handshake failed. error:remote error: tls: bad certificate multiplexer/tls.go:143
The guild seems not to work for me, I try so many possible mistake, it still cannot add the second node to cluster... https://goteleport.com/teleport/docs/quickstart/#add-a-node-to-the-cluster Is there a example configure for me ?
@huifanglu2018 You may have some old credentials cached for some reason. Look at ps -ef | grep teleport
and make sure there are no other Teleport processes running on the node you're trying to add, remove /var/lib/teleport
completely and then run the teleport start
command again.
@webvictim Thank you so much! It works after removing /var/lib/teleport. đ
@webvictim
Jul 12 09:18:28 ip-10-25-1-55 teleport[7630]: 2023-07-12T09:18:28Z ERRO [PROC:1] Node failed to establish connection to cluster: Failed to connect to Proxy Server through tunnel: connection error: desc = "transport: Error while dialing: failed to dial: ssh: handshake failed: read tcp 10.25.1.55:56066->10.25.0.212:3024: i/o timeout". pid:7630.1 service/connect.go:123
These are error logs from a node trying to join the cluster, tried rempcinv /var/lib/teleport and restarting but it did not help. What could be the possible solution for this? Trying to set teleport up in HA environment with Load balancers>
@suchisur It looks like you don't have TLS routing enabled in your cluster, so agents are trying to join over the traditional reverse tunnel port (3024).
You should either:
version: v3
to the top of /etc/teleport.yaml
if it's not there alreadyproxy_listener_mode: multiplex
under auth_service
in /etc/teleport.yaml
Hi @webvictim These are error logs from a node trying to join the cluster. Please help me
_[root@jumpserver03 ~]# systemctl status teleport -l
â teleport.service - Teleport Service
Loaded: loaded (/usr/lib/systemd/system/teleport.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2024-03-11 11:11:50 +07; 4min 55s ago
Main PID: 3357 (teleport)
CGroup: /system.slice/teleport.service
ââ3357 /usr/local/bin/teleport start --config /etc/teleport.yaml --pid-file=/run/teleport.pid
Mar 11 11:15:28 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:15:28+07:00 INFO [AUTH] Attempting registration via proxy server. auth/register.go:288
Mar 11 11:15:28 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:15:28+07:00 ERRO [PROC:1] Node failed to establish connection to cluster: Post "https://longhd98.click:443/v1/webapi/host/credentials": tls: failed to verify certificate: x509: certificate signed by unknown authority. pid:3357.1 service/connect.go:91
Mar 11 11:15:48 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:15:48+07:00 INFO [PROC:1] Joining the cluster with a secure token. pid:3357.1 service/connect.go:417
Mar 11 11:15:48 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:15:48+07:00 INFO [AUTH] Attempting registration via proxy server. auth/register.go:288
Mar 11 11:15:48 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:15:48+07:00 ERRO [PROC:1] Instance failed to establish connection to cluster: Post "https://longhd98.click:443/v1/webapi/host/credentials": tls: failed to verify certificate: x509: certificate signed by unknown authority. pid:3357.1 service/connect.go:91
Mar 11 11:15:50 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:15:50+07:00 WARN [UPLOAD:1] The Instance connector is still not available, process-wide services such as session uploading will not function pid:3357.1 service/service.go:2863
Mar 11 11:16:17 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:16:17+07:00 INFO [PROC:1] Joining the cluster with a secure token. pid:3357.1 service/connect.go:417
Mar 11 11:16:17 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:16:17+07:00 INFO [AUTH] Attempting registration via proxy server. auth/register.go:288
Mar 11 11:16:17 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:16:17+07:00 ERRO [PROC:1] Node failed to establish connection to cluster: Post "https://longhd98.click:443/v1/webapi/host/credentials": tls: failed to verify certificate: x509: certificate signed by unknown authority. pid:3357.1 service/connect.go:91
Mar 11 11:16:20 jumpserver03.vascloud.vnpt.vn teleport[3357]: 2024-03-11T11:16:20+07:00 WARN [UPLOAD:1] The Instance connector is still not available, process-wide services such as session uploading will not function pid:3357.1 service/service.go:2863_
It seems like the certificate being presented by your proxy server is not trusted. If it's not from a trusted CA, you probably want to add this to a unit file override as root
:
cat <<EOF> /etc/systemd/system/teleport.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/local/bin/teleport start --config /etc/teleport.yaml --pid-file=/run/teleport.pid --insecure
EOF
systemctl daemon-reload
systemctl restart teleport
Note that this isn't secure and shouldn't be used in production, just for testing. The correct way to fix this is to get a TLS certificate from a trusted CA on your Teleport proxy.
That's right I'm using a self-signed certificate. Do I have to add the above configuration to the teleport server or remote node or both?
This configuration would be on any remote node/agent joining the Teleport cluster.
The more secure alternative is to install the public key of the issuing CA or self-signed certificate onto each of these joining servers: https://ubuntu.com/server/docs/security-trust-store
enquanto executa um cluster independente, ele funciona bem, mas geralmente ocorre erros enquanto adiciona outro nĂł a esse cluster. um nĂł executado como auth,node,proxy:
/usr/local/bin/teleport start --config=/etc/teleport.yaml --pid-file=/run/teleport.pid --insecure add another node as below: /usr/local/bin/teleport start --roles=node --config=/etc/teleport.yaml --pid-file=/run/teleport.pid --insecure
sempre mostra:
DEBU [HTTP:PROX] No valid environment variables found. proxy/proxy.go:222 DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:137 ERRO [PROC:1] Node failed to establish connection to cluster: ssh: handshake failed: no matching keys found. time/sleep.go:148
alguém pode me ajudar a obter o nó para o cluster?
digite esse comando no proxmox
comando= nano /etc/hostname e /etc/hosts
vai abrir um arquivo e dentro desse arquivo vai estar escrito o nome=pve
altere para pve2 ou pve3, ou pve4 de sua preferĂȘncia porque ele gera conflito com o servidor master pve
@webvictim
I know that this is an old topic, but I have issue with only single instance and I'm out of options for self debuging. Whole cluster has version 15.4.4
.
This is the log of teleport node:
024-06-18T11:56:32Z INFO [PROC:1] Generating new host UUID pid:1310.1 host_uuid:fd675f54-6dff-4d57-92d8-39e1e02f2233 service/service.go:6216
2024-06-18T11:56:33Z INFO [PROC:1] Service is creating new listener. pid:1310.1 type:diag address:127.0.0.1:3000 service/signals.go:249
2024-06-18T11:56:33Z INFO [DIAG:1] Starting diagnostic service. pid:1310.1 listen_address:127.0.0.1:3000 service/service.go:3364
2024-06-18T11:56:33Z INFO [PROC:1] Service is creating new listener. pid:1310.1 type:debug address:/var/lib/teleport/debug.sock service/signals.go:249
2024-06-18T11:56:33Z INFO [PROC:1] Joining the cluster with a secure token. pid:1310.1 service/connect.go:464
2024-06-18T11:56:33Z INFO [PROC:1] Joining the cluster with a secure token. pid:1310.1 service/connect.go:464
2024-06-18T11:56:33Z INFO Attempting registration via proxy server. join/join.go:253
2024-06-18T11:56:33Z INFO Attempting registration via proxy server. join/join.go:253
2024-06-18T11:56:33Z INFO Successfully registered via proxy server. join/join.go:260
2024-06-18T11:56:33Z INFO [PROC:1] Successfully obtained credentials to connect to the cluster. pid:1310.1 identity:App service/connect.go:524
2024-06-18T11:57:03Z WARN [UPLOAD:1] The Instance connector is still not available, process-wide services such as session uploading will not function pid:1310.1 service/service.go:3024
2024-06-18T11:57:33Z WARN [UPLOAD:1] The Instance connector is still not available, process-wide services such as session uploading will not function pid:1310.1 service/service.go:3024
2024-06-18T11:58:03Z WARN [UPLOAD:1] The Instance connector is still not available, process-wide services such as session uploading will not function pid:1310.1 service/service.go:3024
And that log goes forever. I have replaced this VM and I have also checked for configuration and token mistakes. I have replaced token with a new one. Also I have removed directory /var/lib/teleport
and then started again service.
Other instances with the same token are working fine. The main difference is in the APP configuration.
I'm providing node configuration:
version: v3
teleport:
nodename: sdffsdgfsfdgdf
join_params:
token_name: !!!correct_token!!!
method: token
cache:
enabled: yes
max_backoff: 5m
proxy_server: X.X.X.X:443
data_dir: /var/lib/teleport
log:
output: /var/log/teleport/teleport.log
severity: INFO
format:
output: text
ca_pin: ""
diag_addr: "127.0.0.1:3000"
proxy_service:
enabled: False
auth_service:
enabled: False
app_service:
apps:
- insecure_skip_verify: true
labels:
app: grafana
env: internal
name: internal-grafana
public_addr: grafana.XXXXX.xyz
rewrite:
headers:
- 'Origin: https://grafana.XXXXX.xyz'
- 'Host: grafana.XXXXX.xyz'
jwt_claims: roles
redirect:
- localhost
- 10.109.0.3
uri: http://10.109.0.3:8081
enabled: true
db_service:
enabled: false
ssh_service:
enabled: false
Systemd service:
[Unit]
Description=Teleport Service
After=network.target
[Service]
Type=simple
Restart=on-failure
EnvironmentFile=-/etc/default/teleport
ExecStart=/usr/local/bin/teleport start --insecure --config /etc/teleport.yaml --pid-file=/run/teleport.pid
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/run/teleport.pid
LimitNOFILE=524288
[Install]
WantedBy=multi-user.target
Can you help me with that? Do you have any suggestions where to look for the issue with this VM? Maybe there is an issue in auth service or some strange record in postgres database which we are using for auth? I have also noticed that on the others nodes there is a directory called /var/log/teleport/log/upload
which is not created on failed node after restart.
@wszychta Can you share the listing of tctl get token/!!!correct_token!!!
? You can redact the token itself, I just want to see the extra information associated with it.
@webvictim I can provide it, but I have found working solution. Instead of using IP address with --insecure
we have configure agents to go via public teleport endpoint.
Funny fact is that when I have switched back (to IP address and --insecure
) it was still working fine. When I will face the same issue, then I will provide what you ask.
That sounds like a bug to me, I'll see if I can reproduce it.
If your goal is to join agents via a private address rather than the public address, you could try one of the workarounds detailed in this comment: https://github.com/gravitational/teleport/issues/27885#issue-1758924661
Using --insecure
is never a good idea!
@webvictim I was able to reproduce this issue. As suggested I have run tctl get token/!!!correct_token!!!
with bellow result:
tctl get token/!!!correct_token!!!
ERROR: provisioning token(************************token!!!) not found
Also I'm passing part of the result of the command tctl token ls
⯠tctl tokens ls
Token Type Labels Expiry Time (UTC)
-------------------------------- ----------- ------ -----------------------------------
!!!correct_token!!! Node,App,Db 01 Jan 70 00:00 UTC (-477630h56m4s)
This token is correct and created in auth server config file. Again changing public IP address of the Load Balancer to public hostname has solved issue with connecting node to proxy server. As a reminder we are using teleport version 15.4.4
.
while I run a standalone cluster ,it works fine, but it often get errors while I add another node to this cluster. one node run as auth,node,proxy:
it always shows:
can anybody help me get node to the cluster?