Closed Kampe closed 3 years ago
I noticed the services didn't have a cluster IP when listing the services out, so I deleted and recreated them - now they do. I also disabled the TCP service for the time being and still have stuck deliveries just using http proxy and sending the requests locally via a port-forward over the skuppered service to no avail:
2020-12-10 07:37:14.639832 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-10 07:37:14.640171 +0000 ROUTER_CORE (info) [C28] Connection Opened: dir=in host=127.0.0.1:38250 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-10 07:37:14.640413 +0000 ROUTER_CORE (info) [C28][L66] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}
2020-12-10 07:37:14.640468 +0000 ROUTER_CORE (info) [C28][L67] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}
2020-12-10 07:37:50.844384 +0000 ROUTER_CORE (info) [C28][L67] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds
However the router still hasn't crashed yet. Here's what it's links look like currently:
Router Links
type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate stuck cred blkd
==================================================================================================================================================================================================================
endpoint in 1 2 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 1 3 mobile cloud-api 0 250 0 0 7 7 0 0 0 0 0 0 0 0 7 250 -
endpoint in 2 4 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 2 5 mobile cloud-api 0 250 0 0 7 7 0 0 0 0 0 0 0 0 7 250 -
endpoint out 3 6 mobile mc/$skupper-service-sync 0 250 0 0 0 272 0 0 272 0 0 0 0 0 0 8 -
endpoint out 4 7 mobile 92f5bd9b-f921-4408-aa22-4ccd3f5f2c6b/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint in 3 8 mobile mc/$skupper-service-sync 0 250 0 0 0 136 0 0 136 0 0 0 0 0 0 250 -
endpoint in 4 9 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 5 10 local temp.3zy7TuBZKHic_Hb 250 0 0 0 5 5 0 0 0 0 0 0 0 0 10 -
endpoint in 5 11 mobile $management 0 250 0 0 0 5 0 0 5 0 0 0 0 0 0 250 -
endpoint in 5 12 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 6 13 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
edge-downlink out 6 14 edge test-edge-skupper-router-7f45bdfb7c-92pww 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 6 15 mobile _$qd.edge_addr_tracking 0 250 0 0 0 2 2 0 0 0 0 0 2 0 0 32 -
endpoint out 6 16 mobile d5f5e229-5b7e-4553-97d3-24591e1f9555/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 6 17 mobile mc/$skupper-service-sync 0 250 0 0 0 136 0 0 136 0 0 0 0 0 0 250 -
endpoint in 6 18 mobile mc/$skupper-service-sync 0 250 0 0 0 136 0 0 136 0 0 0 0 0 0 250 -
endpoint in 6 19 mobile cloud-api 0 250 0 0 3 3 0 0 0 0 0 0 0 0 3 250 -
endpoint in 6 20 mobile $management 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 6 21 local temp.aBG0HwycWi9015q 250 0 0 0 0 0 0 0 0 0 0 0 0 0 100 -
endpoint in 6 22 mobile _$qd.addr_lookup 0 250 0 0 0 1 1 0 0 0 0 0 0 0 0 32 -
endpoint out 6 23 local temp.D2t5AWqAc9Ww9lK 250 0 0 0 1 1 0 0 0 0 0 0 0 0 250 -
endpoint out 7 24 local temp.eudw80wKeS5p_cd 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 8 25 local temp.Iofgp72C4VN_Sqk 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 8 26 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint in 7 27 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint out 28 66 local temp.hdDlLM2ylwQW4Ut 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 28 67 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint in 29 68 mobile $management 0 250 0 0 0 2 0 0 2 0 0 0 0 0 0 250 -
endpoint out 29 69 local temp.JuL682dfZ_gHiQU 250 0 0 0 1 1 0 0 0 0 0 0 0 0 1 -
So funny enough I left the connection open in a browser and about 10 minutes later - it was able to get through and get me a 200
Strange behavior for sure as this was just testing internal cluster service communication via port-forward
@Kampe sorry Nick! It appears I didn't push the 0.4 tag for the latest image, but have done so now (also tagged as 0.4.0. To get the correct image now you will need to edit the skupper-router deployment and set the imagePullPolicy to Always (or else change the image to 0.4.0 explicitly). (Better management of versioning is coming in 0.5)
I believe the malloc issue should be fixed in the latest image. That was one of the issue holding up the release.
Apologies again for the oversight on my part with the missing tag.
Should have mentioned there are also some HTTP fixes in the correct image. However if after updating you still see issues then we will need to debug further.
Fortunately I'm not seeing the malloc() issue anymore!
Unfortunately still seeing issues with HTTP in particular.
Something interesting I noticed when I don't have any edge sites connected to the VAN and was after recreating the network - the service responds properly works while I'm port-forwarded to the service. As soon as my edge site connects to the VAN the http service in question quits responding, even to the port-forwarded internal traffic - connection just hangs open.
I've tried deleting the service in question and recreating it with no avail. Here's what it looks like from a yaml standpoint:
apiVersion: v1
kind: Service
metadata:
name: cloud-api
labels:
app: cloud-api
annotations:
skupper.io/proxy: http
spec:
selector:
app: cloud-api
ports:
- name: http
port: 5443
Connections
id host container role proto dir security authentication tenant last dlv uptime
=========================================================================================================================================================================================================
3 egress-dispatch TcpAdaptor normal tcp out no-security no-auth 000:00:00:02 000:00:58:46
4 egress-dispatch TcpAdaptor normal tcp out no-security no-auth 000:00:00:02 000:00:58:46
5 egress-dispatch TcpAdaptor normal tcp out no-security no-auth 000:00:00:02 000:00:58:46
2231 10.196.6.8:5443 HTTP/1.x Adaptor normal http/1.x out no-security no-auth 000:00:04:14 000:00:24:59
2232 10.196.0.10:5443 HTTP/1.x Adaptor normal http/1.x out no-security no-auth 000:00:01:05 000:00:24:59
2283 127.0.0.1:53938 HTTP/1.x Adaptor normal http/1.x in no-security no-auth 000:00:24:33 000:00:24:33
2284 127.0.0.1:53942 HTTP/1.x Adaptor normal http/1.x in no-security no-auth 000:00:24:33 000:00:24:33
2320 127.0.0.1:58950 7YqUtplzk67aT2nw1PYtTqVYILh7OXwrJB2EZ8g8NKceIjLUl6cIzg normal amqp in TLSv1.3(TLS_AES_128_GCM_SHA256) CN=skupper-messaging(x.509) - 000:00:24:15
2321 127.0.0.1:58948 S8Fy7Rw5l-OhYMP-zt8ax5jbrLDbmzyy5eS81GMp2qcLKpzkdvR3fg normal amqp in TLSv1.3(TLS_AES_128_GCM_SHA256) CN=skupper-messaging(x.509) 000:00:00:02 000:00:24:15
3963 127.0.0.1:57610 test-edge-skupper-router-7c55bf4d5c-84v9p edge amqp in TLSv1.3(TLS_AES_256_GCM_SHA384) CN=skupper(x.509) 000:00:00:02 000:00:09:58
4923 10.196.0.15:7422 TcpAdaptor normal tcp out no-security no-auth 000:00:02:02 000:00:02:02
5174 127.0.0.1:52834 86c7a9c2-750e-4b46-ba8c-9b7ff8e832df normal amqp in no-security no-auth 000:00:00:00 000:00:00:00
Links
Router Links
type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate stuck cred blkd
====================================================================================================================================================================================================================
endpoint out 3 2 mobile nats-cloud-gateway 0 250 0 0 0 53 0 0 0 0 0 0 0 0 0 10 -
endpoint out 4 3 mobile nats-cloud-gateway 0 250 0 0 0 54 0 0 0 0 0 0 0 0 0 10 -
endpoint out 5 4 mobile nats-cloud-gateway 0 250 0 0 0 55 0 0 0 0 0 0 0 0 0 10 -
endpoint in 2231 2482 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 2231 2483 mobile cloud-api 0 250 0 0 36 36 0 0 0 0 0 0 0 0 13 250 -
endpoint in 2232 2484 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 2232 2485 mobile cloud-api 0 250 0 0 36 36 0 0 0 0 0 0 0 0 12 250 -
endpoint out 2283 2570 local temp.IbYU67ZQcygtbUr 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 2283 2571 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint out 2284 2572 local temp.JlYHx6n65dLHhWj 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 2284 2573 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint out 2320 2629 mobile 19961139-4074-4e97-ab9b-73ca9f2c9864/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint in 2320 2630 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 2321 2631 mobile mc/$skupper-service-sync 0 250 0 0 0 404 0 0 404 0 0 0 0 0 0 6 -
endpoint in 2321 2632 mobile mc/$skupper-service-sync 0 250 0 0 0 210 0 0 210 0 0 0 21 0 0 250 -
endpoint in 3963 5340 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
edge-downlink out 3963 5341 edge test-edge-skupper-router-7c55bf4d5c-84v9p 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 3963 5342 mobile _$qd.edge_addr_tracking 0 250 0 0 0 177 177 0 0 0 0 0 177 0 0 32 -
endpoint in 3963 5344 mobile $management 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 3963 5345 local temp.9IuFdpUqb4fNeH7 250 0 0 0 0 0 0 0 0 0 0 0 0 0 100 -
endpoint in 3963 5346 mobile _$qd.addr_lookup 0 250 0 0 0 548 548 0 0 0 0 0 0 3 0 32 -
endpoint out 3963 5347 local temp.xQFDBhOwa72C+VO 250 0 0 0 548 548 0 0 0 0 0 0 3 0 250 -
endpoint in 3963 5348 250 0 0 0 156 0 0 0 0 0 156 3 0 0 250 -
endpoint in 3963 5349 250 0 0 0 157 0 0 0 0 0 157 1 0 0 250 -
endpoint out 3963 5354 250 0 1 0 155 155 0 0 0 0 0 4 0 0 251 -
endpoint out 3963 5355 250 0 0 0 158 158 0 0 0 0 0 5 0 0 250 -
endpoint in 3963 5356 250 0 0 1 156 0 0 0 0 0 155 0 0 0 250 -
endpoint out 3963 5359 250 0 0 0 156 156 0 0 0 0 0 7 0 0 250 -
endpoint out 3963 5360 mobile d30303d8-ed4f-412e-8125-48fac68d0ab7/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 3963 5361 mobile mc/$skupper-service-sync 0 250 0 0 0 69 0 0 69 0 0 0 1 0 0 250 -
endpoint in 3963 5362 mobile mc/$skupper-service-sync 0 250 0 0 0 68 0 0 68 0 0 0 0 0 0 250 -
endpoint in 3963 5440 mobile cloud-api 0 250 0 0 60 60 0 0 0 0 0 0 0 0 13 250 -
endpoint in 3963 6471 mobile nats-cloud-gateway 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 4643 6490 mobile nats-cloud-gateway 0 250 0 1 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint in 4643 6491 edge test-edge-skupper-router-7c55bf4d5c-84v9p 250 0 0 1 1 0 0 0 0 0 0 0 0 0 10 -
endpoint in 4644 6492 mobile $management 0 250 0 0 0 2 0 0 2 0 0 0 0 0 0 250 -
endpoint out 4644 6493 local temp.5tmOUDSRRI29r0z 250 0 0 0 1 1 0 0 0 0 0 0 0 0 1 -
Addresses
Router Addresses
class addr phs distrib pri local remote in out thru fallback
===================================================================================================================================
local $_management_internal closest - 0 0 0 0 0 0
mobile $management 0 closest - 0 0 132 0 0 0
local $management closest - 0 0 0 0 0 0
mobile 19961139-4074-4e97-ab9b-73ca9f2c9864/skupper-site-query 0 balanced - 1 0 0 0 0 0
mobile _$qd.addr_lookup 0 balanced - 0 0 0 0 0 0
mobile _$qd.edge_addr_tracking 0 balanced - 0 0 0 0 0 0
mobile cloud-api 0 balanced - 2 0 94 94 0 0
mobile d30303d8-ed4f-412e-8125-48fac68d0ab7/skupper-site-query 0 balanced - 1 0 0 0 0 0
edge test-edge-skupper-router-7c55bf4d5c-84v9p balanced - 1 0 543 543 0 0
mobile mc/$skupper-service-sync 0 multicast - 2 0 424 633 0 0
mobile nats-cloud-gateway 0 balanced - 3 0 2,756 2,756 0 0
local qdhello flood - 0 0 0 0 0 0
local qdrouter flood - 0 0 0 0 0 0
topo qdrouter flood - 0 0 0 0 0 0
local qdrouter.ma multicast - 0 0 0 0 0 0
topo qdrouter.ma multicast - 0 0 0 0 0 0
local temp.4LbU9Z0_z5CQvvf balanced - 0 0 0 0 0 0
local temp.9IuFdpUqb4fNeH7 balanced - 1 0 0 0 0 0
local temp.AammmvrVTI78Xdt balanced - 1 0 0 1 0 0
local temp.BIXm9lasoj2JUg8 balanced - 0 0 0 0 0 0
local temp.CAtVYMHFf4Z8Xo_ balanced - 0 0 0 0 0 0
local temp.FUkod1sg2OXAFtG balanced - 0 0 0 0 0 0
local temp.G5CEQj5NeyIIEV_ balanced - 0 0 0 0 0 0
local temp.GIYpOEsDG9kSpNC balanced - 0 0 0 0 0 0
local temp.IbYU67ZQcygtbUr balanced - 1 0 0 0 0 0
local temp.JlYHx6n65dLHhWj balanced - 1 0 0 0 0 0
local temp.SjQUXNzoXcHf7Fq balanced - 0 0 0 0 0 0
local temp.aXhzQcHLL1SmVGV balanced - 0 0 0 0 0 0
local temp.bPVzrsTSGjyYm6z balanced - 0 0 0 0 0 0
local temp.gBexlrtbR+POb+G balanced - 0 0 0 0 0 0
local temp.gxZkIvOJKD3v1EP balanced - 0 0 0 0 0 0
local temp.ka6D_Xp2vFoac4a balanced - 0 0 0 0 0 0
local temp.lfuKUoQX9loL_4D balanced - 0 0 0 0 0 0
local temp.mahQE9KWxcY5YV+ balanced - 0 0 0 0 0 0
local temp.o75z9S4JkkJ9Xt2 balanced - 0 0 0 0 0 0
local temp.oWvhkIalixWH0mA balanced - 0 0 0 0 0 0
local temp.xQFDBhOwa72C+VO balanced - 1 0 0 621 0 0
local temp.zIiOm3NKc4laTXT balanced - 0 0 0 0 0 0
local temp.zz+l6Xe5HwFSv_M balanced - 0 0 0 0 0 0
Did start seeing these in the router logs:
2020-12-11 03:18:27.200647 +0000 SERVER (info) [C8700] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54126
2020-12-11 03:18:27.286162 +0000 SERVER (info) [C8700] Connection from 127.0.0.1:54126 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:30.234138 +0000 SERVER (info) [C8701] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54144
2020-12-11 03:18:30.251122 +0000 SERVER (info) [C8701] Connection from 127.0.0.1:54144 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:30.610810 +0000 SERVER (info) [C8702] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54148
2020-12-11 03:18:30.671114 +0000 SERVER (info) [C8702] Connection from 127.0.0.1:54148 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:32.380915 +0000 SERVER (info) [C8703] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54158
2020-12-11 03:18:32.466844 +0000 SERVER (info) [C8703] Connection from 127.0.0.1:54158 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:35.305376 +0000 SERVER (info) [C8704] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54178
2020-12-11 03:18:35.321356 +0000 SERVER (info) [C8704] Connection from 127.0.0.1:54178 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:35.753820 +0000 SERVER (info) [C8705] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54180
2020-12-11 03:18:35.814649 +0000 SERVER (info) [C8705] Connection from 127.0.0.1:54180 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:37.567204 +0000 SERVER (info) [C8706] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54192
2020-12-11 03:18:37.665489 +0000 SERVER (info) [C8706] Connection from 127.0.0.1:54192 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:40.366870 +0000 SERVER (info) [C8707] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54214
2020-12-11 03:18:40.386527 +0000 SERVER (info) [C8707] Connection from 127.0.0.1:54214 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:40.923192 +0000 SERVER (info) [C8708] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54220
2020-12-11 03:18:40.987561 +0000 SERVER (info) [C8708] Connection from 127.0.0.1:54220 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:42.770643 +0000 SERVER (info) [C8709] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54226
2020-12-11 03:18:42.861809 +0000 SERVER (info) [C8709] Connection from 127.0.0.1:54226 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:45.458111 +0000 SERVER (info) [C8710] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54246
2020-12-11 03:18:45.473047 +0000 SERVER (info) [C8710] Connection from 127.0.0.1:54246 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:46.074349 +0000 SERVER (info) [C8711] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54248
2020-12-11 03:18:46.134335 +0000 SERVER (info) [C8711] Connection from 127.0.0.1:54248 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:47.964428 +0000 SERVER (info) [C8712] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54264
2020-12-11 03:18:48.048625 +0000 SERVER (info) [C8712] Connection from 127.0.0.1:54264 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:50.535263 +0000 SERVER (info) [C8713] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54286
2020-12-11 03:18:50.604639 +0000 SERVER (info) [C8713] Connection from 127.0.0.1:54286 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:51.216390 +0000 SERVER (info) [C8714] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54294
2020-12-11 03:18:51.278484 +0000 SERVER (info) [C8714] Connection from 127.0.0.1:54294 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error
2020-12-11 03:18:53.156642 +0000 SERVER (info) [C8715] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54306
The errors in the log indicate connection failures from an edge site. (The 127.0.0.1 is I think due to the loadbalancer in use at your central site not giving the real ip of the client). How many edge sites do you have connecting? Do you see errors on the edge site(s)? (Also just to rule it out, you are not using an older version of skupper on the edge sites are you?)
There is one successfully established edge in the qdstat output with uptime of nearly 10 minutes, whereas the logged failures are within a second of connection being accepted. I suspect there is at least one other edge failing to connect? However I think that is likely to be a separate issue to the HTTP requests hanging.
If you grep the router log for HTTP, what do you see? I suspect we may need to turn up the logging to debug further.
Interesting, at the time last evening I had one edge site connected, I've introduced another site to the VAN this morning as well as another http proxied service skupper-test
- which has since been deleted
Here's the logs with with a grep HTTP
over them
$ k logs skupper-router-7fdd579697-hz2r8 router | grep HTTP
2020-12-11 19:27:42.464310 +0000 HTTP_ADAPTOR (info) Configured HTTP_ADAPTOR listener on 0.0.0.0:1027
2020-12-11 19:27:42.464745 +0000 HTTP_ADAPTOR (notice) Listening for HTTP/1.x client requests on 0.0.0.0:1027
2020-12-11 19:27:45.854016 +0000 ROUTER_CORE (info) [C69657] Connection Opened: dir=out host=10.196.1.11:80 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:28:28.385000 +0000 HTTP_ADAPTOR (info) [C2284] Disconnected
2020-12-11 19:28:28.457982 +0000 HTTP_ADAPTOR (info) [C2283] Disconnected
2020-12-11 19:28:28.769514 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:28:28.771434 +0000 ROUTER_CORE (info) [C69791] Connection Opened: dir=in host=127.0.0.1:36270 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:28:28.796645 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:28:28.808700 +0000 ROUTER_CORE (info) [C69792] Connection Opened: dir=in host=127.0.0.1:36276 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:29:52.223165 +0000 HTTP_ADAPTOR (info) Deleted HttpConnector for skupper-test, 10.196.1.11:80
2020-12-11 19:29:52.223693 +0000 HTTP_ADAPTOR (error) [C69657] Connection closing: Connection closed by management
2020-12-11 19:29:52.231279 +0000 HTTP_ADAPTOR (info) Deleted HttpListener for skupper-test, 0.0.0.0:1027
2020-12-11 19:30:36.077019 +0000 HTTP_ADAPTOR (info) [C69792] Disconnected
2020-12-11 19:30:36.078962 +0000 HTTP_ADAPTOR (info) [C69791] Disconnected
2020-12-11 19:30:36.126013 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.133704 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:30:36.135452 +0000 ROUTER_CORE (info) [C70204] Connection Opened: dir=in host=127.0.0.1:37910 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:30:36.142094 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.155332 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.164885 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.190547 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.203107 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.217697 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.243975 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.266860 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.276897 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:30:36.277186 +0000 ROUTER_CORE (info) [C70205] Connection Opened: dir=in host=127.0.0.1:37916 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
Here's how the links look currently:
Router Links
type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate stuck cred blkd
=====================================================================================================================================================================================================================
endpoint out 3 2 mobile nats-cloud-gateway 0 250 0 0 0 770 0 0 0 0 0 0 0 0 1 10 -
endpoint out 4 3 mobile nats-cloud-gateway 0 250 0 0 0 840 0 0 0 0 0 0 0 0 0 10 -
endpoint out 5 4 mobile nats-cloud-gateway 0 250 0 0 0 842 0 0 0 0 0 0 0 0 1 10 -
endpoint in 2231 2482 250 0 0 0 40 0 0 0 0 40 0 0 0 0 250 -
endpoint out 2231 2483 mobile cloud-api 0 250 0 0 17 57 0 0 40 0 0 0 40 0 17 250 -
endpoint in 2232 2484 250 0 0 0 5 0 0 0 0 5 0 0 0 0 250 -
endpoint out 2232 2485 mobile cloud-api 0 250 0 0 48 53 0 0 5 0 0 0 5 0 48 250 -
endpoint out 70204 81737 local temp._Ixy9iQca5SlymS 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 70204 81738 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint out 70205 81739 local temp.z70QEFMNMtaprsq 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 70205 81740 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint out 70208 81741 mobile mc/$skupper-service-sync 0 250 0 0 0 228 0 0 228 0 0 0 0 0 0 7 -
endpoint in 70208 81742 mobile mc/$skupper-service-sync 0 250 0 0 0 79 0 0 79 0 0 0 31 0 0 250 -
endpoint out 70207 81743 mobile 19961139-4074-4e97-ab9b-73ca9f2c9864/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint in 70207 81744 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 70210 81745 250 0 0 0 6 6 0 0 0 0 0 0 0 0 250 -
edge-downlink out 70210 81746 edge test-edge-skupper-router-5c95c57c7f-zl65c 250 0 0 0 6 0 0 6 0 0 0 0 0 0 250 -
endpoint out 70210 81747 mobile _$qd.edge_addr_tracking 0 250 0 0 0 255 255 0 0 0 0 0 255 0 0 32 -
endpoint out 70210 81748 mobile mc/$skupper-service-sync 0 250 0 0 0 149 0 0 149 0 0 0 0 0 0 250 -
endpoint in 70210 81749 mobile mc/$skupper-service-sync 0 250 0 0 0 79 0 0 79 0 0 0 35 0 0 250 -
endpoint out 70210 81750 mobile ad87c630-e3a4-4fd9-81dc-ce62c1574cbb/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 70210 81752 mobile $management 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 70210 81753 local temp.GmkM3QfhfYbSyFV 250 0 0 0 0 0 0 0 0 0 0 0 0 0 100 -
endpoint in 70210 81754 mobile _$qd.addr_lookup 0 250 0 0 0 757 757 0 0 0 0 0 0 1 0 32 -
endpoint out 70210 81755 local temp.Jjb6uAAZevNnQkF 250 0 0 0 757 757 0 0 0 0 0 0 1 0 250 -
endpoint in 70210 81756 250 0 0 0 253 0 0 0 0 0 253 0 0 0 250 -
endpoint in 70210 81757 250 0 0 0 253 0 0 0 0 0 253 0 0 0 250 -
endpoint in 70210 81758 250 0 0 0 253 0 0 0 0 0 253 0 0 0 250 -
endpoint out 70210 81764 250 0 0 0 253 253 0 0 0 0 0 0 0 0 250 -
endpoint out 70210 81765 250 0 0 0 253 253 0 0 0 0 0 0 0 0 250 -
endpoint out 70210 81767 250 0 0 0 253 253 0 0 0 0 0 0 0 0 250 -
endpoint in 70210 82521 mobile cloud-api 0 250 0 0 1 1 0 0 0 0 0 0 0 0 1 250 -
endpoint in 71598 84620 250 0 0 0 3 3 1 0 0 0 0 0 0 0 250 -
edge-downlink out 71598 84621 edge test-edge-skupper-router-7c55bf4d5c-84v9p 250 0 0 0 3 0 0 3 0 0 0 0 0 0 250 -
endpoint out 71598 84622 mobile _$qd.edge_addr_tracking 0 250 0 0 0 35 35 0 0 0 0 0 35 0 0 32 -
endpoint out 71598 84623 mobile d30303d8-ed4f-412e-8125-48fac68d0ab7/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 71598 84624 mobile $management 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 71598 84625 local temp.JD8VHyVe7uvPdGA 250 0 0 0 0 0 0 0 0 0 0 0 0 0 100 -
endpoint in 71598 84626 mobile _$qd.addr_lookup 0 250 0 0 0 93 93 0 0 0 0 0 0 1 0 32 -
endpoint out 71598 84627 local temp.gcDNckLv7G9ephD 250 0 0 0 93 93 0 0 0 0 0 0 1 0 250 -
endpoint out 71598 84629 mobile mc/$skupper-service-sync 0 250 0 0 0 27 0 0 27 0 0 0 14 0 0 250 -
endpoint in 71598 84630 mobile mc/$skupper-service-sync 0 250 0 0 0 12 0 0 12 0 0 0 0 0 0 250 -
endpoint in 71598 84631 250 0 0 0 31 0 0 0 0 0 31 0 0 0 250 -
endpoint in 71598 84632 250 0 0 1 32 0 0 0 0 0 31 1 0 0 250 -
endpoint out 71598 84637 250 0 0 0 32 32 0 0 0 0 0 2 0 0 250 -
endpoint out 71598 84638 250 0 0 0 32 32 0 0 0 0 0 1 0 0 250 -
endpoint in 71598 84639 250 0 0 0 31 0 0 0 0 0 31 1 0 0 250 -
endpoint out 71598 84656 250 0 1 0 29 29 0 0 0 0 0 1 0 0 251 -
endpoint in 71598 85094 mobile nats-cloud-gateway 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 71823 85099 mobile nats-cloud-gateway 0 250 0 1 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint in 71823 85100 edge test-edge-skupper-router-7c55bf4d5c-84v9p 250 0 0 1 1 0 0 0 0 0 0 0 0 0 10 -
endpoint in 71824 85101 mobile $management 0 250 0 0 0 2 0 0 2 0 0 0 0 0 0 250 -
endpoint out 71824 85102 local temp.e521DKyEFUtrxG8 250 0 0 0 1 1 0 0 0 0 0 0 0 0 1 -
Also, I spoke too soon and the TCP connections don't currently seem to be making it through to the proxied service either.
Here they are in their current state:
Name: nats-cloud-gateway
Namespace: default
Labels: app=nats-cloud
app.kubernetes.io/instance=test-cloud-platform
location=cloud
Annotations: internal.skupper.io/originalAssignedPort: 1028
internal.skupper.io/originalSelector: app=nats-cloud,location=cloud
internal.skupper.io/originalTargetPort: 7422
skupper.io/port: 7422
skupper.io/proxy: tcp
Selector: application=skupper-router,skupper.io/component=router
Type: ClusterIP
IP: None
Port: leaf 7422/TCP
TargetPort: 1028/TCP
Endpoints: 10.196.3.2:1028
Session Affinity: None
Events: <none>
---
Name: cloud-api
Namespace: default
Labels: app=cloud-api
app.kubernetes.io/instance=test-cloud
Annotations: internal.skupper.io/originalAssignedPort: 1026
internal.skupper.io/originalSelector: app=cloud-api
internal.skupper.io/originalTargetPort: 5443
skupper.io/proxy: http
Selector: application=skupper-router,skupper.io/component=router
Type: ClusterIP
IP: 10.200.14.223
Port: http 5443/TCP
TargetPort: 1026/TCP
Endpoints: 10.196.3.2:1026
Session Affinity: None
Events: <none>
I noticed the http proxied service has an IP, while the TCP service is headless in the cloud. On the edge we see:
Name: cloud-api
Namespace: default
Labels: <none>
Annotations: internal.skupper.io/controlled: true
Selector: application=skupper-router,skupper.io/component=router
Type: ClusterIP
IP: 10.43.198.49
Port: cloud-api 5443/TCP
TargetPort: 1026/TCP
Endpoints: 10.42.0.45:1026
Session Affinity: None
Events: <none>
---
Name: nats-cloud-gateway
Namespace: default
Labels: <none>
Annotations: internal.skupper.io/controlled: true
Selector: application=skupper-router,skupper.io/component=router
Type: ClusterIP
IP: 10.43.164.74
Port: nats-cloud-gateway 7422/TCP
TargetPort: 1029/TCP
Endpoints: 10.42.0.45:1029
Session Affinity: None
Events: <none>
So, I believe we've found the culprit.
Istio-proxy, when enabled on the router - will cause this issue.
you can however keep the istio sidecars on your skuppered services, just not the router 👍
Wondering now how I can best put some custom annotations on the router pod? Can that be a configuration value as well? :)
Would need to have a think about how best to do that, but yes, it seems like something we could add. Perhaps certain annotations on the skupper site could be copied to the router. E.g. there could be an annotation skupper.io/router-annotations that took a list of keys of other annotations to copy if present? That would allow the service-controller to also be annotated if needed.
Yeah that would be an interesting solution, potentially whatever annotations are on the site-controller would be useful to propagate to the router given a flag? In a way, "skuppering" the annotations to the router the controller deploys.
Well solved on the istio-proxy issue! Thanks for the details also. We will try and figure out how we could make things a bit more obvious.
Well solved on the istio-proxy issue! Thanks for the details also. We will try and figure out how we could make things a bit more obvious.
credit to @kungfuchicken as skupper is part of his demo in a few minutes!
@kungfuchicken++
Perhaps certain annotations on the skupper site could be copied to the router. E.g. there could be an annotation skupper.io/router-annotations that took a list of keys of other annotations to copy if present?
Or perhaps reversing that would offer a simpler solution. I.e. all annotations on the skupper-site configmap would be copied to both router and service-contoller, but there would be a special annotation, e.g. skupper.io/ignore-router-annotations which would take a list of keys that should be ignored and not copied. Likewise for the service-controller. That way in the simple case all you need to do is add annotations to the skupper-site configmap that initialises the site. Would that work for you? Would it be ok if the annotations by default were applied to all the skupper created deployments?
That would work perfectly well, and yes having them propagate to all services skupper controller creates seems reasonable as well
demo went well.
@Kampe @kungfuchicken Ted asked whether it would make sense to have the router deployment include whatever annotations causes istio to ignore it (sidecar.istio.io/inject=false?) on by default. Wdyt?
That's certainly perfect for our usecase but I could definitely see other usecases where potentially other annotations may need be set.
I agree that having a good way to add annotations is probably needed. We should also just default to having the Istio annotation present because nothing good is ever going to result from having sidecars injected into the Skupper router pod.
-Ted
On Wed, Dec 16, 2020 at 11:34 AM Nick Kampe notifications@github.com wrote:
That's certainly perfect for our usecase but I could definitely see other usecases where potentially other annotations may need be set.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/skupperproject/skupper/issues/341#issuecomment-746592494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFEKJW3UW3PRLYGWMXJCJ3SVDOPTANCNFSM4UUS4DEA .
Yes it would be in addition to the more general mechanism (but the mechanism might then need to be able to turn off defaults)
From what we've discovered it's only needed on the router in particular as it's doing interesting things with service ports and I believe envoy-proxy just by default will block on what it's not aware of. The service controller and site-controller should work as intended even within the istio mesh, will be glad to test!
Hello!
Just updated my clusters to utilize 0.4.0 of the site-controller as well as the new service-controller:0.4.0
Ran into some very interesting issues attempting to utilize my services, currently testing the HTTP endpoint manually, while I also have a test service running testing the TCP proxy. Here's the logs from the router before it crashed.
Here's how we currently configure skupper:
cloud hub
edge
we have two services exposed:
Before the (cloud hub) router crashed I hopped on the pod and ran
qdstat -l
and noticed there were many links piling up for the http transfer. Here's an example of them.Be sure to let me know if there's any other information you'd be interested to see.