skupperproject / skupper

Skupper is an implementation of a Virtual Application Network, enabling rich hybrid cloud communication.
http://skupper.io
Apache License 2.0
589 stars 73 forks source link

Skupper Router v0.4.0 Hard Crash malloc(): unsorted double linked list corrupted #341

Closed Kampe closed 3 years ago

Kampe commented 3 years ago

Hello!

Just updated my clusters to utilize 0.4.0 of the site-controller as well as the new service-controller:0.4.0

Ran into some very interesting issues attempting to utilize my services, currently testing the HTTP endpoint manually, while I also have a test service running testing the TCP proxy. Here's the logs from the router before it crashed.

2020-12-10 07:00:03.850112 +0000 ROUTER_CORE (info) [C77] Connection Closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:04.753583 +0000 ROUTER_CORE (info) [C5][L181] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                           
2020-12-10 07:00:05.144506 +0000 HTTP_ADAPTOR (info) [C1] Connection closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:05.144563 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:05.802507 +0000 ROUTER_CORE (info) [C5][L186] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}                                                                                                                                                                                                                               
2020-12-10 07:00:05.859764 +0000 ROUTER_CORE (info) [C79] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:05.859868 +0000 TCP_ADAPTOR (info) [C79] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:05.860130 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:05.860182 +0000 TCP_ADAPTOR (info) [C79] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:05.860363 +0000 ROUTER_CORE (info) [C80] Connection Opened: dir=in host=10.196.3.155:52766 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:05.860457 +0000 ROUTER_CORE (info) [C79][L187] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}                                                                                                                                                                                                                             
2020-12-10 07:00:05.860482 +0000 ROUTER_CORE (info) [C80][L188] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}                                                                                                                                                                                                                                    
2020-12-10 07:00:05.860498 +0000 ROUTER_CORE (info) [C80][L189] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}                                                                                                                                                                                                                                       
2020-12-10 07:00:05.860556 +0000 TCP_ADAPTOR (info) [C79] Disconnected                                                                                                                                                                                                                                                                                                                 
2020-12-10 07:00:05.860623 +0000 ROUTER_CORE (info) [C79][L190] Link attached: dir=in source={<none> expire:link} target={amqp:/_edge/test-edge-skupper-router-7f45bdfb7c-92pww/temp.nQRyzDZbD_AkDXv expire:link}                                                                                                                                                                     
2020-12-10 07:00:05.860635 +0000 ROUTER_CORE (info) [C79][L190] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:05.860930 +0000 ROUTER_CORE (info) [C79][L187] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=1 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:05.860950 +0000 ROUTER_CORE (info) [C79] Connection Closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:06.763495 +0000 ROUTER_CORE (info) [C5][L186] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                           
2020-12-10 07:00:07.645104 +0000 HTTP_ADAPTOR (info) [C1] Connection closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:07.645592 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:07.801700 +0000 ROUTER_CORE (info) [C5][L191] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}                                                                                                                                                                                                                               
2020-12-10 07:00:07.862693 +0000 ROUTER_CORE (info) [C81] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:07.862758 +0000 TCP_ADAPTOR (info) [C81] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:07.863047 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:07.863094 +0000 TCP_ADAPTOR (info) [C81] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:07.863190 +0000 ROUTER_CORE (info) [C82] Connection Opened: dir=in host=10.196.3.155:52780 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:07.863322 +0000 ROUTER_CORE (info) [C81][L192] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}                                                                                                                                                                                                                             
2020-12-10 07:00:07.863349 +0000 ROUTER_CORE (info) [C82][L193] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}                                                                                                                                                                                                                                    
2020-12-10 07:00:07.863366 +0000 ROUTER_CORE (info) [C82][L194] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}                                                                                                                                                                                                                                       
2020-12-10 07:00:07.863408 +0000 TCP_ADAPTOR (info) [C81] Disconnected                                                                                                                                                                                                                                                                                                                 
2020-12-10 07:00:07.863525 +0000 ROUTER_CORE (info) [C81][L195] Link attached: dir=in source={<none> expire:link} target={amqp:/_edge/test-edge-skupper-router-7f45bdfb7c-92pww/temp.Bxht0NGTkcDBtc_ expire:link}                                                                                                                                                                     
2020-12-10 07:00:07.863539 +0000 ROUTER_CORE (info) [C81][L195] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:07.863555 +0000 ROUTER_CORE (info) [C81][L192] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=1 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:07.863567 +0000 ROUTER_CORE (info) [C81] Connection Closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:08.768780 +0000 ROUTER_CORE (info) [C5][L191] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                           
2020-12-10 07:00:09.801661 +0000 ROUTER_CORE (info) [C5][L196] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}                                                                                                                                                                                                                               
2020-12-10 07:00:09.858965 +0000 ROUTER_CORE (info) [C83] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:09.859253 +0000 TCP_ADAPTOR (info) [C83] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:09.859836 +0000 ROUTER_CORE (info) [C83][L197] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}                                                                                                                                                                                                                             
2020-12-10 07:00:09.860398 +0000 ROUTER_CORE (info) [C84] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:09.860772 +0000 TCP_ADAPTOR (info) [C84] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:09.861488 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:09.861670 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:09.861841 +0000 TCP_ADAPTOR (info) [C83] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:09.862037 +0000 TCP_ADAPTOR (info) [C83] Disconnected                                                                                                                                                                                                                                                                                                                 
2020-12-10 07:00:09.862225 +0000 TCP_ADAPTOR (info) [C84] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:09.862330 +0000 ROUTER_CORE (info) [C85] Connection Opened: dir=in host=10.196.3.155:52794 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:09.862491 +0000 ROUTER_CORE (info) [C86] Connection Opened: dir=in host=10.196.3.155:52796 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:10.147989 +0000 HTTP_ADAPTOR (info) [C1] Connection closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:10.148607 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...                                                                                                                                                                                                                                                                                     
malloc(): unsorted double linked list corrupted

Here's how we currently configure skupper: cloud hub

apiVersion: v1
kind: ConfigMap
metadata:
  name: skupper-site
data:
  cluster-local: "false"
  console: "true"
  console-authentication: internal
  console-password: "barney"
  console-user: "rubble"
  edge: "false"
  name: test-cloud
  router-console: "true"
  service-controller: "true"
  service-sync: "true"

edge

apiVersion: v1
kind: ConfigMap
metadata:
  name: skupper-site
data:
  cluster-local: "false"
  console: "true"
  console-authentication: internal
  console-password: "barney"
  console-user: "rubble"
  edge: "true"
  name: test-edge
  router-console: "true"
  service-controller: "true"
  service-sync: "true"

we have two services exposed:

Services exposed through Skupper:
    cloud-api (http port 5443)
    nats-cloud-gateway (tcp port 7422)

Before the (cloud hub) router crashed I hopped on the pod and ran qdstat -l and noticed there were many links piling up for the http transfer. Here's an example of them.

Router Links
  type           dir  conn id  id   peer  class   addr                                                     phs  cap  pri  undel  unsett  deliv  presett  psdrop  acc  rej  rel  mod  delay  rate  stuck  cred  blkd
  =======================================================================================================================================================================================================================
  endpoint       out  2        2          mobile  nats-cloud-gateway                                       0    250  0    1      0       8      0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  3        3          mobile  92f5bd9b-f921-4408-aa22-4ccd3f5f2c6b/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   3        4                                                                                250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  4        5          mobile  mc/$skupper-service-sync                                 0    250  0    0      0       5      0        0       5    0    0    0    0      0     0      10    -
  endpoint       in   4        6          mobile  mc/$skupper-service-sync                                 0    250  0    0      0       3      0        0       3    0    0    0    0      0     0      250   -
  endpoint       in   9        15                                                                               250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  edge-downlink  out  9        16         edge    test-edge-skupper-router-7f45bdfb7c-92pww                     250  0    0      0       1      1        0       0    0    0    0    0      0     0      250   -
  endpoint       out  9        17         mobile  _$qd.edge_addr_tracking                                  0    250  0    0      0       6      6        0       0    0    0    0    6      0     0      32    -
  endpoint       out  9        18         mobile  d5f5e229-5b7e-4553-97d3-24591e1f9555/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  9        19         mobile  mc/$skupper-service-sync                                 0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       in   9        20         mobile  mc/$skupper-service-sync                                 0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       in   9        21         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:10
  endpoint       in   9        22         mobile  $management                                              0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  9        23         local   temp.TRsrm472AwVpo_a                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      100   -
  endpoint       in   9        24         mobile  _$qd.addr_lookup                                         0    250  0    0      0       15     15       0       0    0    0    0    0      1     0      32    -
  endpoint       out  9        25         local   temp.8e4pdFWcbq2rs05                                          250  0    0      0       15     15       0       0    0    0    0    0      1     0      250   -
  endpoint       in   9        29                                                                               250  0    0      1       5      0        0       0    0    4    0    0      0     0      250   -
  endpoint       in   9        30                                                                               250  0    0      1       5      0        0       0    0    4    0    0      0     0      250   -
  endpoint       in   9        31                                                                               250  0    0      1       5      0        0       0    0    4    0    0      0     0      250   -
  endpoint       out  12       34         local   temp.OTbglwINSsqSAAG                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   12       35         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:09
  endpoint       out  9        36                                                                               250  0    1      0       5      5        0       0    0    0    0    0      0     0      251   -
  endpoint       out  15       40         local   temp.dfzQtbbcY8XWFx5                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   15       41         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:08
  endpoint       out  16       42         local   temp.sNLLCkrGX97NSYa                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   16       43         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:08
  endpoint       out  21       51         local   temp.4BJYXn5+JRV8m2i                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   21       52         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:07
  endpoint       out  9        56                                                                               250  0    1      0       3      3        0       0    0    0    0    0      0     0      251   -
  endpoint       out  22       57         local   temp.4iYq0pWc7SU+PE2                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   22       58         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:07
  endpoint       out  24       61         local   temp.X5LW6rXYd9F4IEm                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   24       62         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:06
  endpoint       out  27       67         local   temp.pCKON1EaK8AGKhg                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   27       68         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:05
  endpoint       out  29       71         local   temp.MFPWripcNzwrGNE                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   29       72         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:05
  endpoint       out  32       78         local   temp.LFc8+CYtSXCpAIg                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   32       79         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:04
  endpoint       out  38       89         local   temp._xi8ae4bvy7JK9h                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   38       90         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:03
  endpoint       out  9        93                                                                               250  0    0      0       1      1        0       0    0    0    0    0      0     0      250   -
  endpoint       out  39       94         local   temp.GajzcZTJkGLXDke                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   39       95         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:03
  endpoint       out  40       96         local   temp.fpRRu0WvFy1JQo9                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   40       97         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:03
  endpoint       in   9        102        mobile  nats-cloud-gateway                                       0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  43       103        mobile  nats-cloud-gateway                                       0    250  0    1      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  44       104        mobile  nats-cloud-gateway                                       0    250  0    1      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   43       105        edge    test-edge-skupper-router-7f45bdfb7c-92pww                     250  0    0      1       1      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   44       106        edge    test-edge-skupper-router-7f45bdfb7c-92pww                     250  0    0      1       1      0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  46       107        local   temp.LeygFKNf2tvM2Ua                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   46       108        mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:01
  endpoint       out  45       109        local   temp.2P3RBIU4kGgdhjJ                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   45       110        mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:01
  endpoint       in   47       111        mobile  $management                                              0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       out  47       112        local   temp.4cohluBlyWklnIs                                          250  0    0      0       1      1        0       0    0    0    0    0      0     0      1     -
skuclient version                 0.4.0
transport version              quay.io/skupper/qdrouterd:0.4 (sha256:037ec89c755a)
controller version             quay.io/skupper/service-controller:0.4.0 (sha256:b5c96ec83369)

Be sure to let me know if there's any other information you'd be interested to see.

Kampe commented 3 years ago

I noticed the services didn't have a cluster IP when listing the services out, so I deleted and recreated them - now they do. I also disabled the TCP service for the time being and still have stuck deliveries just using http proxy and sending the requests locally via a port-forward over the skuppered service to no avail:

2020-12-10 07:37:14.639832 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026                                                                                                                                                                                                                                                                                                 
2020-12-10 07:37:14.640171 +0000 ROUTER_CORE (info) [C28] Connection Opened: dir=in host=127.0.0.1:38250 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                                    
2020-12-10 07:37:14.640413 +0000 ROUTER_CORE (info) [C28][L66] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}                                                                                                                                                                                                                                                 
2020-12-10 07:37:14.640468 +0000 ROUTER_CORE (info) [C28][L67] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}                                                                                                                                                                                                                                                    
2020-12-10 07:37:50.844384 +0000 ROUTER_CORE (info) [C28][L67] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds

However the router still hasn't crashed yet. Here's what it's links look like currently:


Router Links
  type           dir  conn id  id  peer  class   addr                                                     phs  cap  pri  undel  unsett  deliv  presett  psdrop  acc  rej  rel  mod  delay  rate  stuck  cred  blkd
  ==================================================================================================================================================================================================================
  endpoint       in   1        2                                                                               250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  1        3         mobile  cloud-api                                                0    250  0    0      7       7      0        0       0    0    0    0    0      0     7      250   -
  endpoint       in   2        4                                                                               250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  2        5         mobile  cloud-api                                                0    250  0    0      7       7      0        0       0    0    0    0    0      0     7      250   -
  endpoint       out  3        6         mobile  mc/$skupper-service-sync                                 0    250  0    0      0       272    0        0       272  0    0    0    0      0     0      8     -
  endpoint       out  4        7         mobile  92f5bd9b-f921-4408-aa22-4ccd3f5f2c6b/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   3        8         mobile  mc/$skupper-service-sync                                 0    250  0    0      0       136    0        0       136  0    0    0    0      0     0      250   -
  endpoint       in   4        9                                                                               250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  5        10        local   temp.3zy7TuBZKHic_Hb                                          250  0    0      0       5      5        0       0    0    0    0    0      0     0      10    -
  endpoint       in   5        11        mobile  $management                                              0    250  0    0      0       5      0        0       5    0    0    0    0      0     0      250   -
  endpoint       in   5        12                                                                              250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   6        13                                                                              250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  edge-downlink  out  6        14        edge    test-edge-skupper-router-7f45bdfb7c-92pww                     250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  6        15        mobile  _$qd.edge_addr_tracking                                  0    250  0    0      0       2      2        0       0    0    0    0    2      0     0      32    -
  endpoint       out  6        16        mobile  d5f5e229-5b7e-4553-97d3-24591e1f9555/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  6        17        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       136    0        0       136  0    0    0    0      0     0      250   -
  endpoint       in   6        18        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       136    0        0       136  0    0    0    0      0     0      250   -
  endpoint       in   6        19        mobile  cloud-api                                                0    250  0    0      3       3      0        0       0    0    0    0    0      0     3      250   -
  endpoint       in   6        20        mobile  $management                                              0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  6        21        local   temp.aBG0HwycWi9015q                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      100   -
  endpoint       in   6        22        mobile  _$qd.addr_lookup                                         0    250  0    0      0       1      1        0       0    0    0    0    0      0     0      32    -
  endpoint       out  6        23        local   temp.D2t5AWqAc9Ww9lK                                          250  0    0      0       1      1        0       0    0    0    0    0      0     0      250   -
  endpoint       out  7        24        local   temp.eudw80wKeS5p_cd                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  8        25        local   temp.Iofgp72C4VN_Sqk                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   8        26        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       in   7        27        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       out  28       66        local   temp.hdDlLM2ylwQW4Ut                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   28       67        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       in   29       68        mobile  $management                                              0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       out  29       69        local   temp.JuL682dfZ_gHiQU                                          250  0    0      0       1      1        0       0    0    0    0    0      0     0      1     -
Kampe commented 3 years ago

So funny enough I left the connection open in a browser and about 10 minutes later - it was able to get through and get me a 200

Strange behavior for sure as this was just testing internal cluster service communication via port-forward

grs commented 3 years ago

@Kampe sorry Nick! It appears I didn't push the 0.4 tag for the latest image, but have done so now (also tagged as 0.4.0. To get the correct image now you will need to edit the skupper-router deployment and set the imagePullPolicy to Always (or else change the image to 0.4.0 explicitly). (Better management of versioning is coming in 0.5)

I believe the malloc issue should be fixed in the latest image. That was one of the issue holding up the release.

Apologies again for the oversight on my part with the missing tag.

grs commented 3 years ago

Should have mentioned there are also some HTTP fixes in the correct image. However if after updating you still see issues then we will need to debug further.

Kampe commented 3 years ago

Fortunately I'm not seeing the malloc() issue anymore!

Unfortunately still seeing issues with HTTP in particular.

Something interesting I noticed when I don't have any edge sites connected to the VAN and was after recreating the network - the service responds properly works while I'm port-forwarded to the service. As soon as my edge site connects to the VAN the http service in question quits responding, even to the port-forwarded internal traffic - connection just hangs open.

I've tried deleting the service in question and recreating it with no avail. Here's what it looks like from a yaml standpoint:

apiVersion: v1
kind: Service
metadata:
  name: cloud-api
  labels:
    app: cloud-api
  annotations:
    skupper.io/proxy: http
spec:
  selector:
    app: cloud-api
  ports:
  - name: http
    port: 5443

Connections

  id    host              container                                               role    proto     dir  security                         authentication               tenant  last dlv      uptime
  =========================================================================================================================================================================================================
  3     egress-dispatch   TcpAdaptor                                              normal  tcp       out  no-security                      no-auth                              000:00:00:02  000:00:58:46
  4     egress-dispatch   TcpAdaptor                                              normal  tcp       out  no-security                      no-auth                              000:00:00:02  000:00:58:46
  5     egress-dispatch   TcpAdaptor                                              normal  tcp       out  no-security                      no-auth                              000:00:00:02  000:00:58:46
  2231  10.196.6.8:5443   HTTP/1.x Adaptor                                        normal  http/1.x  out  no-security                      no-auth                              000:00:04:14  000:00:24:59
  2232  10.196.0.10:5443  HTTP/1.x Adaptor                                        normal  http/1.x  out  no-security                      no-auth                              000:00:01:05  000:00:24:59
  2283  127.0.0.1:53938   HTTP/1.x Adaptor                                        normal  http/1.x  in   no-security                      no-auth                              000:00:24:33  000:00:24:33
  2284  127.0.0.1:53942   HTTP/1.x Adaptor                                        normal  http/1.x  in   no-security                      no-auth                              000:00:24:33  000:00:24:33
  2320  127.0.0.1:58950   7YqUtplzk67aT2nw1PYtTqVYILh7OXwrJB2EZ8g8NKceIjLUl6cIzg  normal  amqp      in   TLSv1.3(TLS_AES_128_GCM_SHA256)  CN=skupper-messaging(x.509)          -             000:00:24:15
  2321  127.0.0.1:58948   S8Fy7Rw5l-OhYMP-zt8ax5jbrLDbmzyy5eS81GMp2qcLKpzkdvR3fg  normal  amqp      in   TLSv1.3(TLS_AES_128_GCM_SHA256)  CN=skupper-messaging(x.509)          000:00:00:02  000:00:24:15
  3963  127.0.0.1:57610   test-edge-skupper-router-7c55bf4d5c-84v9p               edge    amqp      in   TLSv1.3(TLS_AES_256_GCM_SHA384)  CN=skupper(x.509)                    000:00:00:02  000:00:09:58
  4923  10.196.0.15:7422  TcpAdaptor                                              normal  tcp       out  no-security                      no-auth                              000:00:02:02  000:00:02:02
  5174  127.0.0.1:52834   86c7a9c2-750e-4b46-ba8c-9b7ff8e832df                    normal  amqp      in   no-security                      no-auth                              000:00:00:00  000:00:00:00

Links

Router Links
  type           dir  conn id  id    peer  class   addr                                                     phs  cap  pri  undel  unsett  deliv  presett  psdrop  acc  rej  rel  mod  delay  rate  stuck  cred  blkd
  ====================================================================================================================================================================================================================
  endpoint       out  3        2           mobile  nats-cloud-gateway                                       0    250  0    0      0       53     0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  4        3           mobile  nats-cloud-gateway                                       0    250  0    0      0       54     0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  5        4           mobile  nats-cloud-gateway                                       0    250  0    0      0       55     0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   2231     2482                                                                              250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  2231     2483        mobile  cloud-api                                                0    250  0    0      36      36     0        0       0    0    0    0    0      0     13     250   -
  endpoint       in   2232     2484                                                                              250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  2232     2485        mobile  cloud-api                                                0    250  0    0      36      36     0        0       0    0    0    0    0      0     12     250   -
  endpoint       out  2283     2570        local   temp.IbYU67ZQcygtbUr                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   2283     2571        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       out  2284     2572        local   temp.JlYHx6n65dLHhWj                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   2284     2573        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       out  2320     2629        mobile  19961139-4074-4e97-ab9b-73ca9f2c9864/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   2320     2630                                                                              250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  2321     2631        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       404    0        0       404  0    0    0    0      0     0      6     -
  endpoint       in   2321     2632        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       210    0        0       210  0    0    0    21     0     0      250   -
  endpoint       in   3963     5340                                                                              250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  edge-downlink  out  3963     5341        edge    test-edge-skupper-router-7c55bf4d5c-84v9p                     250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  3963     5342        mobile  _$qd.edge_addr_tracking                                  0    250  0    0      0       177    177      0       0    0    0    0    177    0     0      32    -
  endpoint       in   3963     5344        mobile  $management                                              0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  3963     5345        local   temp.9IuFdpUqb4fNeH7                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      100   -
  endpoint       in   3963     5346        mobile  _$qd.addr_lookup                                         0    250  0    0      0       548    548      0       0    0    0    0    0      3     0      32    -
  endpoint       out  3963     5347        local   temp.xQFDBhOwa72C+VO                                          250  0    0      0       548    548      0       0    0    0    0    0      3     0      250   -
  endpoint       in   3963     5348                                                                              250  0    0      0       156    0        0       0    0    0    156  3      0     0      250   -
  endpoint       in   3963     5349                                                                              250  0    0      0       157    0        0       0    0    0    157  1      0     0      250   -
  endpoint       out  3963     5354                                                                              250  0    1      0       155    155      0       0    0    0    0    4      0     0      251   -
  endpoint       out  3963     5355                                                                              250  0    0      0       158    158      0       0    0    0    0    5      0     0      250   -
  endpoint       in   3963     5356                                                                              250  0    0      1       156    0        0       0    0    0    155  0      0     0      250   -
  endpoint       out  3963     5359                                                                              250  0    0      0       156    156      0       0    0    0    0    7      0     0      250   -
  endpoint       out  3963     5360        mobile  d30303d8-ed4f-412e-8125-48fac68d0ab7/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  3963     5361        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       69     0        0       69   0    0    0    1      0     0      250   -
  endpoint       in   3963     5362        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       68     0        0       68   0    0    0    0      0     0      250   -
  endpoint       in   3963     5440        mobile  cloud-api                                                0    250  0    0      60      60     0        0       0    0    0    0    0      0     13     250   -
  endpoint       in   3963     6471        mobile  nats-cloud-gateway                                       0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  4643     6490        mobile  nats-cloud-gateway                                       0    250  0    1      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   4643     6491        edge    test-edge-skupper-router-7c55bf4d5c-84v9p                     250  0    0      1       1      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   4644     6492        mobile  $management                                              0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       out  4644     6493        local   temp.5tmOUDSRRI29r0z                                          250  0    0      0       1      1        0       0    0    0    0    0      0     0      1     -

Addresses

Router Addresses
  class   addr                                                     phs  distrib    pri  local  remote  in     out    thru  fallback
  ===================================================================================================================================
  local   $_management_internal                                         closest    -    0      0       0      0      0     0
  mobile  $management                                              0    closest    -    0      0       132    0      0     0
  local   $management                                                   closest    -    0      0       0      0      0     0
  mobile  19961139-4074-4e97-ab9b-73ca9f2c9864/skupper-site-query  0    balanced   -    1      0       0      0      0     0
  mobile  _$qd.addr_lookup                                         0    balanced   -    0      0       0      0      0     0
  mobile  _$qd.edge_addr_tracking                                  0    balanced   -    0      0       0      0      0     0
  mobile  cloud-api                                                0    balanced   -    2      0       94     94     0     0
  mobile  d30303d8-ed4f-412e-8125-48fac68d0ab7/skupper-site-query  0    balanced   -    1      0       0      0      0     0
  edge    test-edge-skupper-router-7c55bf4d5c-84v9p                     balanced   -    1      0       543    543    0     0
  mobile  mc/$skupper-service-sync                                 0    multicast  -    2      0       424    633    0     0
  mobile  nats-cloud-gateway                                       0    balanced   -    3      0       2,756  2,756  0     0
  local   qdhello                                                       flood      -    0      0       0      0      0     0
  local   qdrouter                                                      flood      -    0      0       0      0      0     0
  topo    qdrouter                                                      flood      -    0      0       0      0      0     0
  local   qdrouter.ma                                                   multicast  -    0      0       0      0      0     0
  topo    qdrouter.ma                                                   multicast  -    0      0       0      0      0     0
  local   temp.4LbU9Z0_z5CQvvf                                          balanced   -    0      0       0      0      0     0
  local   temp.9IuFdpUqb4fNeH7                                          balanced   -    1      0       0      0      0     0
  local   temp.AammmvrVTI78Xdt                                          balanced   -    1      0       0      1      0     0
  local   temp.BIXm9lasoj2JUg8                                          balanced   -    0      0       0      0      0     0
  local   temp.CAtVYMHFf4Z8Xo_                                          balanced   -    0      0       0      0      0     0
  local   temp.FUkod1sg2OXAFtG                                          balanced   -    0      0       0      0      0     0
  local   temp.G5CEQj5NeyIIEV_                                          balanced   -    0      0       0      0      0     0
  local   temp.GIYpOEsDG9kSpNC                                          balanced   -    0      0       0      0      0     0
  local   temp.IbYU67ZQcygtbUr                                          balanced   -    1      0       0      0      0     0
  local   temp.JlYHx6n65dLHhWj                                          balanced   -    1      0       0      0      0     0
  local   temp.SjQUXNzoXcHf7Fq                                          balanced   -    0      0       0      0      0     0
  local   temp.aXhzQcHLL1SmVGV                                          balanced   -    0      0       0      0      0     0
  local   temp.bPVzrsTSGjyYm6z                                          balanced   -    0      0       0      0      0     0
  local   temp.gBexlrtbR+POb+G                                          balanced   -    0      0       0      0      0     0
  local   temp.gxZkIvOJKD3v1EP                                          balanced   -    0      0       0      0      0     0
  local   temp.ka6D_Xp2vFoac4a                                          balanced   -    0      0       0      0      0     0
  local   temp.lfuKUoQX9loL_4D                                          balanced   -    0      0       0      0      0     0
  local   temp.mahQE9KWxcY5YV+                                          balanced   -    0      0       0      0      0     0
  local   temp.o75z9S4JkkJ9Xt2                                          balanced   -    0      0       0      0      0     0
  local   temp.oWvhkIalixWH0mA                                          balanced   -    0      0       0      0      0     0
  local   temp.xQFDBhOwa72C+VO                                          balanced   -    1      0       0      621    0     0
  local   temp.zIiOm3NKc4laTXT                                          balanced   -    0      0       0      0      0     0
  local   temp.zz+l6Xe5HwFSv_M                                          balanced   -    0      0       0      0      0     0

Did start seeing these in the router logs:

2020-12-11 03:18:27.200647 +0000 SERVER (info) [C8700] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54126                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:27.286162 +0000 SERVER (info) [C8700] Connection from 127.0.0.1:54126 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:30.234138 +0000 SERVER (info) [C8701] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54144                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:30.251122 +0000 SERVER (info) [C8701] Connection from 127.0.0.1:54144 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:30.610810 +0000 SERVER (info) [C8702] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54148                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:30.671114 +0000 SERVER (info) [C8702] Connection from 127.0.0.1:54148 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:32.380915 +0000 SERVER (info) [C8703] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54158                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:32.466844 +0000 SERVER (info) [C8703] Connection from 127.0.0.1:54158 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:35.305376 +0000 SERVER (info) [C8704] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54178                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:35.321356 +0000 SERVER (info) [C8704] Connection from 127.0.0.1:54178 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:35.753820 +0000 SERVER (info) [C8705] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54180                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:35.814649 +0000 SERVER (info) [C8705] Connection from 127.0.0.1:54180 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:37.567204 +0000 SERVER (info) [C8706] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54192                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:37.665489 +0000 SERVER (info) [C8706] Connection from 127.0.0.1:54192 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:40.366870 +0000 SERVER (info) [C8707] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54214                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:40.386527 +0000 SERVER (info) [C8707] Connection from 127.0.0.1:54214 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:40.923192 +0000 SERVER (info) [C8708] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54220                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:40.987561 +0000 SERVER (info) [C8708] Connection from 127.0.0.1:54220 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:42.770643 +0000 SERVER (info) [C8709] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54226                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:42.861809 +0000 SERVER (info) [C8709] Connection from 127.0.0.1:54226 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:45.458111 +0000 SERVER (info) [C8710] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54246                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:45.473047 +0000 SERVER (info) [C8710] Connection from 127.0.0.1:54246 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:46.074349 +0000 SERVER (info) [C8711] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54248                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:46.134335 +0000 SERVER (info) [C8711] Connection from 127.0.0.1:54248 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:47.964428 +0000 SERVER (info) [C8712] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54264                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:48.048625 +0000 SERVER (info) [C8712] Connection from 127.0.0.1:54264 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:50.535263 +0000 SERVER (info) [C8713] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54286                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:50.604639 +0000 SERVER (info) [C8713] Connection from 127.0.0.1:54286 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:51.216390 +0000 SERVER (info) [C8714] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54294                                                                                                                                                                                                                                                                                     
2020-12-11 03:18:51.278484 +0000 SERVER (info) [C8714] Connection from 127.0.0.1:54294 (to 0.0.0.0:45671) failed: amqp:connection:framing-error SSL Failure: Unknown error                                                                                                                                                                                                                           
2020-12-11 03:18:53.156642 +0000 SERVER (info) [C8715] Accepted connection to 0.0.0.0:45671 from 127.0.0.1:54306
grs commented 3 years ago

The errors in the log indicate connection failures from an edge site. (The 127.0.0.1 is I think due to the loadbalancer in use at your central site not giving the real ip of the client). How many edge sites do you have connecting? Do you see errors on the edge site(s)? (Also just to rule it out, you are not using an older version of skupper on the edge sites are you?)

There is one successfully established edge in the qdstat output with uptime of nearly 10 minutes, whereas the logged failures are within a second of connection being accepted. I suspect there is at least one other edge failing to connect? However I think that is likely to be a separate issue to the HTTP requests hanging.

If you grep the router log for HTTP, what do you see? I suspect we may need to turn up the logging to debug further.

Kampe commented 3 years ago

Interesting, at the time last evening I had one edge site connected, I've introduced another site to the VAN this morning as well as another http proxied service skupper-test - which has since been deleted

Here's the logs with with a grep HTTP over them

$ k logs skupper-router-7fdd579697-hz2r8 router | grep HTTP
2020-12-11 19:27:42.464310 +0000 HTTP_ADAPTOR (info) Configured HTTP_ADAPTOR listener on 0.0.0.0:1027
2020-12-11 19:27:42.464745 +0000 HTTP_ADAPTOR (notice) Listening for HTTP/1.x client requests on 0.0.0.0:1027
2020-12-11 19:27:45.854016 +0000 ROUTER_CORE (info) [C69657] Connection Opened: dir=out host=10.196.1.11:80 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:28:28.385000 +0000 HTTP_ADAPTOR (info) [C2284] Disconnected
2020-12-11 19:28:28.457982 +0000 HTTP_ADAPTOR (info) [C2283] Disconnected
2020-12-11 19:28:28.769514 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:28:28.771434 +0000 ROUTER_CORE (info) [C69791] Connection Opened: dir=in host=127.0.0.1:36270 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:28:28.796645 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:28:28.808700 +0000 ROUTER_CORE (info) [C69792] Connection Opened: dir=in host=127.0.0.1:36276 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:29:52.223165 +0000 HTTP_ADAPTOR (info) Deleted HttpConnector for skupper-test, 10.196.1.11:80
2020-12-11 19:29:52.223693 +0000 HTTP_ADAPTOR (error) [C69657] Connection closing: Connection closed by management
2020-12-11 19:29:52.231279 +0000 HTTP_ADAPTOR (info) Deleted HttpListener for skupper-test, 0.0.0.0:1027
2020-12-11 19:30:36.077019 +0000 HTTP_ADAPTOR (info) [C69792] Disconnected
2020-12-11 19:30:36.078962 +0000 HTTP_ADAPTOR (info) [C69791] Disconnected
2020-12-11 19:30:36.126013 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.133704 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:30:36.135452 +0000 ROUTER_CORE (info) [C70204] Connection Opened: dir=in host=127.0.0.1:37910 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-11 19:30:36.142094 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.155332 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.164885 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.190547 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.203107 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.217697 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.243975 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.266860 +0000 HTTP_ADAPTOR (warning) [C2231][L2482] response message not received, outcome=0x26
2020-12-11 19:30:36.276897 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1026
2020-12-11 19:30:36.277186 +0000 ROUTER_CORE (info) [C70205] Connection Opened: dir=in host=127.0.0.1:37916 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=

Here's how the links look currently:

Router Links
  type           dir  conn id  id     peer  class   addr                                                     phs  cap  pri  undel  unsett  deliv  presett  psdrop  acc  rej  rel  mod  delay  rate  stuck  cred  blkd
  =====================================================================================================================================================================================================================
  endpoint       out  3        2            mobile  nats-cloud-gateway                                       0    250  0    0      0       770    0        0       0    0    0    0    0      0     1      10    -
  endpoint       out  4        3            mobile  nats-cloud-gateway                                       0    250  0    0      0       840    0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  5        4            mobile  nats-cloud-gateway                                       0    250  0    0      0       842    0        0       0    0    0    0    0      0     1      10    -
  endpoint       in   2231     2482                                                                               250  0    0      0       40     0        0       0    0    40   0    0      0     0      250   -
  endpoint       out  2231     2483         mobile  cloud-api                                                0    250  0    0      17      57     0        0       40   0    0    0    40     0     17     250   -
  endpoint       in   2232     2484                                                                               250  0    0      0       5      0        0       0    0    5    0    0      0     0      250   -
  endpoint       out  2232     2485         mobile  cloud-api                                                0    250  0    0      48      53     0        0       5    0    0    0    5      0     48     250   -
  endpoint       out  70204    81737        local   temp._Ixy9iQca5SlymS                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   70204    81738        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       out  70205    81739        local   temp.z70QEFMNMtaprsq                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   70205    81740        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       out  70208    81741        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       228    0        0       228  0    0    0    0      0     0      7     -
  endpoint       in   70208    81742        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       79     0        0       79   0    0    0    31     0     0      250   -
  endpoint       out  70207    81743        mobile  19961139-4074-4e97-ab9b-73ca9f2c9864/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   70207    81744                                                                              250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   70210    81745                                                                              250  0    0      0       6      6        0       0    0    0    0    0      0     0      250   -
  edge-downlink  out  70210    81746        edge    test-edge-skupper-router-5c95c57c7f-zl65c                     250  0    0      0       6      0        0       6    0    0    0    0      0     0      250   -
  endpoint       out  70210    81747        mobile  _$qd.edge_addr_tracking                                  0    250  0    0      0       255    255      0       0    0    0    0    255    0     0      32    -
  endpoint       out  70210    81748        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       149    0        0       149  0    0    0    0      0     0      250   -
  endpoint       in   70210    81749        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       79     0        0       79   0    0    0    35     0     0      250   -
  endpoint       out  70210    81750        mobile  ad87c630-e3a4-4fd9-81dc-ce62c1574cbb/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   70210    81752        mobile  $management                                              0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  70210    81753        local   temp.GmkM3QfhfYbSyFV                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      100   -
  endpoint       in   70210    81754        mobile  _$qd.addr_lookup                                         0    250  0    0      0       757    757      0       0    0    0    0    0      1     0      32    -
  endpoint       out  70210    81755        local   temp.Jjb6uAAZevNnQkF                                          250  0    0      0       757    757      0       0    0    0    0    0      1     0      250   -
  endpoint       in   70210    81756                                                                              250  0    0      0       253    0        0       0    0    0    253  0      0     0      250   -
  endpoint       in   70210    81757                                                                              250  0    0      0       253    0        0       0    0    0    253  0      0     0      250   -
  endpoint       in   70210    81758                                                                              250  0    0      0       253    0        0       0    0    0    253  0      0     0      250   -
  endpoint       out  70210    81764                                                                              250  0    0      0       253    253      0       0    0    0    0    0      0     0      250   -
  endpoint       out  70210    81765                                                                              250  0    0      0       253    253      0       0    0    0    0    0      0     0      250   -
  endpoint       out  70210    81767                                                                              250  0    0      0       253    253      0       0    0    0    0    0      0     0      250   -
  endpoint       in   70210    82521        mobile  cloud-api                                                0    250  0    0      1       1      0        0       0    0    0    0    0      0     1      250   -
  endpoint       in   71598    84620                                                                              250  0    0      0       3      3        1       0    0    0    0    0      0     0      250   -
  edge-downlink  out  71598    84621        edge    test-edge-skupper-router-7c55bf4d5c-84v9p                     250  0    0      0       3      0        0       3    0    0    0    0      0     0      250   -
  endpoint       out  71598    84622        mobile  _$qd.edge_addr_tracking                                  0    250  0    0      0       35     35       0       0    0    0    0    35     0     0      32    -
  endpoint       out  71598    84623        mobile  d30303d8-ed4f-412e-8125-48fac68d0ab7/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   71598    84624        mobile  $management                                              0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  71598    84625        local   temp.JD8VHyVe7uvPdGA                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      100   -
  endpoint       in   71598    84626        mobile  _$qd.addr_lookup                                         0    250  0    0      0       93     93       0       0    0    0    0    0      1     0      32    -
  endpoint       out  71598    84627        local   temp.gcDNckLv7G9ephD                                          250  0    0      0       93     93       0       0    0    0    0    0      1     0      250   -
  endpoint       out  71598    84629        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       27     0        0       27   0    0    0    14     0     0      250   -
  endpoint       in   71598    84630        mobile  mc/$skupper-service-sync                                 0    250  0    0      0       12     0        0       12   0    0    0    0      0     0      250   -
  endpoint       in   71598    84631                                                                              250  0    0      0       31     0        0       0    0    0    31   0      0     0      250   -
  endpoint       in   71598    84632                                                                              250  0    0      1       32     0        0       0    0    0    31   1      0     0      250   -
  endpoint       out  71598    84637                                                                              250  0    0      0       32     32       0       0    0    0    0    2      0     0      250   -
  endpoint       out  71598    84638                                                                              250  0    0      0       32     32       0       0    0    0    0    1      0     0      250   -
  endpoint       in   71598    84639                                                                              250  0    0      0       31     0        0       0    0    0    31   1      0     0      250   -
  endpoint       out  71598    84656                                                                              250  0    1      0       29     29       0       0    0    0    0    1      0     0      251   -
  endpoint       in   71598    85094        mobile  nats-cloud-gateway                                       0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  71823    85099        mobile  nats-cloud-gateway                                       0    250  0    1      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   71823    85100        edge    test-edge-skupper-router-7c55bf4d5c-84v9p                     250  0    0      1       1      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   71824    85101        mobile  $management                                              0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       out  71824    85102        local   temp.e521DKyEFUtrxG8                                          250  0    0      0       1      1        0       0    0    0    0    0      0     0      1     -

Also, I spoke too soon and the TCP connections don't currently seem to be making it through to the proxied service either.

Here they are in their current state:

Name:              nats-cloud-gateway
Namespace:         default
Labels:            app=nats-cloud
                   app.kubernetes.io/instance=test-cloud-platform
                   location=cloud
Annotations:       internal.skupper.io/originalAssignedPort: 1028
                   internal.skupper.io/originalSelector: app=nats-cloud,location=cloud
                   internal.skupper.io/originalTargetPort: 7422
                   skupper.io/port: 7422
                   skupper.io/proxy: tcp
Selector:          application=skupper-router,skupper.io/component=router
Type:              ClusterIP
IP:                None
Port:              leaf  7422/TCP
TargetPort:        1028/TCP
Endpoints:         10.196.3.2:1028
Session Affinity:  None
Events:            <none>

---

Name:              cloud-api
Namespace:         default
Labels:            app=cloud-api
                   app.kubernetes.io/instance=test-cloud
Annotations:       internal.skupper.io/originalAssignedPort: 1026
                   internal.skupper.io/originalSelector: app=cloud-api
                   internal.skupper.io/originalTargetPort: 5443
                   skupper.io/proxy: http
Selector:          application=skupper-router,skupper.io/component=router
Type:              ClusterIP
IP:                10.200.14.223
Port:              http  5443/TCP
TargetPort:        1026/TCP
Endpoints:         10.196.3.2:1026
Session Affinity:  None
Events:            <none>

I noticed the http proxied service has an IP, while the TCP service is headless in the cloud. On the edge we see:

Name:              cloud-api
Namespace:         default
Labels:            <none>
Annotations:       internal.skupper.io/controlled: true
Selector:          application=skupper-router,skupper.io/component=router
Type:              ClusterIP
IP:                10.43.198.49
Port:              cloud-api  5443/TCP
TargetPort:        1026/TCP
Endpoints:         10.42.0.45:1026
Session Affinity:  None
Events:            <none>

---

Name:              nats-cloud-gateway
Namespace:         default
Labels:            <none>
Annotations:       internal.skupper.io/controlled: true
Selector:          application=skupper-router,skupper.io/component=router
Type:              ClusterIP
IP:                10.43.164.74
Port:              nats-cloud-gateway  7422/TCP
TargetPort:        1029/TCP
Endpoints:         10.42.0.45:1029
Session Affinity:  None
Events:            <none>
Kampe commented 3 years ago

So, I believe we've found the culprit.

Istio-proxy, when enabled on the router - will cause this issue.

you can however keep the istio sidecars on your skuppered services, just not the router 👍

Wondering now how I can best put some custom annotations on the router pod? Can that be a configuration value as well? :)

grs commented 3 years ago

Would need to have a think about how best to do that, but yes, it seems like something we could add. Perhaps certain annotations on the skupper site could be copied to the router. E.g. there could be an annotation skupper.io/router-annotations that took a list of keys of other annotations to copy if present? That would allow the service-controller to also be annotated if needed.

Kampe commented 3 years ago

Yeah that would be an interesting solution, potentially whatever annotations are on the site-controller would be useful to propagate to the router given a flag? In a way, "skuppering" the annotations to the router the controller deploys.

grs commented 3 years ago

Well solved on the istio-proxy issue! Thanks for the details also. We will try and figure out how we could make things a bit more obvious.

Kampe commented 3 years ago

Well solved on the istio-proxy issue! Thanks for the details also. We will try and figure out how we could make things a bit more obvious.

credit to @kungfuchicken as skupper is part of his demo in a few minutes!

grs commented 3 years ago

@kungfuchicken++

grs commented 3 years ago

Perhaps certain annotations on the skupper site could be copied to the router. E.g. there could be an annotation skupper.io/router-annotations that took a list of keys of other annotations to copy if present?

Or perhaps reversing that would offer a simpler solution. I.e. all annotations on the skupper-site configmap would be copied to both router and service-contoller, but there would be a special annotation, e.g. skupper.io/ignore-router-annotations which would take a list of keys that should be ignored and not copied. Likewise for the service-controller. That way in the simple case all you need to do is add annotations to the skupper-site configmap that initialises the site. Would that work for you? Would it be ok if the annotations by default were applied to all the skupper created deployments?

Kampe commented 3 years ago

That would work perfectly well, and yes having them propagate to all services skupper controller creates seems reasonable as well

kungfuchicken commented 3 years ago

demo went well.

grs commented 3 years ago

@Kampe @kungfuchicken Ted asked whether it would make sense to have the router deployment include whatever annotations causes istio to ignore it (sidecar.istio.io/inject=false?) on by default. Wdyt?

Kampe commented 3 years ago

That's certainly perfect for our usecase but I could definitely see other usecases where potentially other annotations may need be set.

ted-ross commented 3 years ago

I agree that having a good way to add annotations is probably needed. We should also just default to having the Istio annotation present because nothing good is ever going to result from having sidecars injected into the Skupper router pod.

-Ted

On Wed, Dec 16, 2020 at 11:34 AM Nick Kampe notifications@github.com wrote:

That's certainly perfect for our usecase but I could definitely see other usecases where potentially other annotations may need be set.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/skupperproject/skupper/issues/341#issuecomment-746592494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFEKJW3UW3PRLYGWMXJCJ3SVDOPTANCNFSM4UUS4DEA .

grs commented 3 years ago

Yes it would be in addition to the more general mechanism (but the mechanism might then need to be able to turn off defaults)

Kampe commented 3 years ago

From what we've discovered it's only needed on the router in particular as it's doing interesting things with service ports and I believe envoy-proxy just by default will block on what it's not aware of. The service controller and site-controller should work as intended even within the istio mesh, will be glad to test!