Closed agixio closed 3 weeks ago
Havent yet looked at the detailed logs but it seems like the loxilb instances cant communicate over grpc channel. This is usually due to various reasons:
Indeed I haven't done all that, I'll test it and come back to you afterwards. thanks !
The doc did not mention setting up of inbound security groups for loxilb instance. It has been updated here. You can check if things are working by trying :
## From loxilb1
$ nc <loxilb2-instance-IP> 11111 -v
Connection to <loxilb2-instance-IP> 11111 port [tcp/*] succeeded!
$ nc <loxilb2-instance-IP> 22222 -v
Connection to <loxilb2-instance-IP> 22222 port [tcp/*] succeeded!
## From loxilb2
$ nc <loxilb1-instance-IP> 11111 -v
Connection to <loxilb1-instance-IP> 11111 port [tcp/*] succeeded!
$ nc <loxilb1-instance-IP> 22222 -v
Connection to <loxilb1-instance-IP> 22222 port [tcp/*] succeeded!
@UltraInstinct14 @TrekkieCoder Thank you very much for your help. Everything works better with all the rules in place, and now they can communicate properly. The HA status is also fine. However, after completing all the steps, the Elastic IP is still not associated with any of my EC2 instances, which is strange. I followed the exact same network architecture and used the same CIDR as in the tutorial.
We have an error talking to the kernel
2024/09/29 23:00:32 Serving loxilb rest API at http://[::]:11111
2024-09-29 23:00:33 [API] HA POST API called. url : /netlox/v1/config/cistate
2024-09-29 23:00:33 [API] Instance default New HA State : BACKUP, VIP: 0.0.0.0
2024-09-29 23:00:33 [CLUSTER] Instance default Current State NOT_DEFINED Updated State: BACKUP VIP : 0.0.0.0
2024-09-29 23:00:33 failed to get ENI intf name ()
2024-09-29 23:00:33 [API] Load balancer POST API called. url : /netlox/v1/config/loadbalancer
2024-09-29 23:00:33 [API] lbRules : {{35.180.153.68 192.168.248.254 55002 tcp 0 0 false false 2 1800 true 0 0 0 default_nginx-lb1 0} [] [{192.168.36.180 31630 1 }]}
2024-09-29 23:00:33 ep-host added 192.168.36.180_tcp_31630:0
2024-09-29 23:00:33 fullnat:suitable source for 192.168.36.180: 192.168.248.254
2024-09-29 23:00:33 nat lb-rule added - 1:dst-35.180.153.68/32,proto-6,dport-55002,-do-fullnat:eip-192.168.36.180,ep-31630,w-1,alive|
2024-09-29 23:00:33 Added cluster-peer 192.168.228.173
2024-09-29 23:00:33 [DP] LB rule 192.168.248.254 add[OK]
2024-09-29 23:00:35 inactive ep - 192.168.36.180_tcp_31630:tcp:31630(next try after 60s)
2024-09-29 23:00:35 [NLP] Link msgs subscribed
...
2024-09-29 23:00:42 [API] HA POST API called. url : /netlox/v1/config/cistate
2024-09-29 23:00:42 [API] Instance default New HA State : MASTER, VIP: 0.0.0.0
2024-09-29 23:00:42 [CLUSTER] Instance default Current State BACKUP Updated State: MASTER VIP : 0.0.0.0
2024-09-29 23:00:42 no loxiType intf found
2024-09-29 23:00:42 Get xsync()
2024-09-29 23:00:42 XSync netRPC - 192.168.228.173:22222 :Connected
2024-09-29 23:00:42 RPC - CT Xsync Remote-1
2024-09-29 23:00:42 cidrBlock (192.168.248.0/24) associate failed in VPC vpc-09d80cc6acffaad98:operation error EC2: AssociateVpcCidrBlock, https response error StatusCode: 400, RequestID: 6d3a8c7f-ccd7-4cdc-82d3-750360f3c8e9, api error CidrConflict: CIDR range conflicts with 192.168.0.0/16 with association ID vpc-cidr-assoc-0acaa95895001e808
2024-09-29 23:00:44 XSync netRPC - 192.168.228.173:22222 :Reset
2024-09-29 23:00:44 XSync netRPC - 192.168.228.173:22222 :Connected
2024-09-29 23:00:44 RPC - CT Get 1
2024-09-29 23:00:44 CT Bcast
2024-09-29 23:00:44 [CT] CTBcast Complete
23:01:02 TRACE loxilb_libdp.c:2592: ct: #192.168.218.226:35631 -> 192.168.0.2:53 (17)# rid:0 est:0 nat:0 (Aged:202194512
Sorry for the inconvenience. After trying the scenario again from the tutorial doc, it seems the document has not been updated for the latest versions. I will try to summarize what things need to be done differently.
Run loxilb with a cloudcidrblock subnet which is not currently used in the VPC. For example use 124.124.124.0/24 CIDR :
sudo docker run -u root --cap-add SYS_ADMIN \
--restart unless-stopped \
--net=host \
--privileged \
-dit \
-v /dev/log:/dev/log -e AWS_REGION=ap-northeast-2 \
--name loxilb \
ghcr.io/loxilb-io/loxilb:latest \
--cloud=aws --cloudcidrblock=124.124.124.0/24
Please note that image to be used is ghcr.io/loxilb-io/loxilb:latest . "aws-support" labeled image has been discontinued.
Change kube-loxilb.yaml to have a private CIDR VIP of the above CIDR subnet:
- --externalCIDR=13.208.X.X/32
- --privateCIDR==124.124.124.250/32
- --setRoles=0.0.0.0
@TrekkieCoder Thank you. No problem at all regarding the documentation being slightly out of date; I completely understand. I'm happy to help a bit to update it to make it accessible to everyone. I followed your file configuration exactly as provided, but it didn’t work. I then tried various mixing your with the arguments from the documentation, but still unable to mount the Elastic IP. I don't know where i messed.
Log trace link: https://docs.google.com/document/d/1ZFliJHdDZ3ruM30V6TxiICTCBxl1HsECS6B3hcVlZCg/edit?usp=sharing
Hi @agixio ,
Your log trace file cant be opened since it asks for permissions.
https://docs.google.com/document/d/1ZFliJHdDZ3ruM30V6TxiICTCBxl1HsECS6B3hcVlZCg/edit?usp=sharing
My bad: https://docs.google.com/document/d/18xf9R6WpDWyzmxnzM_sOny9gP2S-hwOpIkPgPahYkN4/edit?usp=sharing
Double checked the logs. But there are no logs related to service creation ?
@TrekkieCoder: My service don't work but sorry i sent u the old log with the deprecated docker image, i updated everything with all pictures :
https://docs.google.com/document/d/1MEkQzmVEvkiOgLSS6OnqyAVaZLnNgrl2-I8q6KDTipg/edit?usp=sharing
I am closing this since this issue has been resolved via discussion in loxilb slack channel.
Describe the bug I followed all instructions in the docs to deploy loxilb in multi az ha with aws but my 2 loxilb instances timeout: 2024-09-27 15:18:29 XSync netRPC Connect - 192.168.228.58:22222 :Fail(dial tcp 192.168.228.58:22222: i/o timeout) 2024-09-27 15:18:24 XSync netRPC Connect - 192.168.218.57:22222 :Fail(dial tcp 192.168.218.57:22222: i/o timeout)
(more error at the google drive link)
To Reproduce Follow the documentations here: https://github.com/loxilb-io/loxilbdocs/blob/main/docs/aws-multi-az.md
Screenshots I have all screenshots at this link: https://docs.google.com/document/d/1DFvKcP8WCYQhhud0h9FWDaQoYm2iX417t7Uit1ObRcg/edit?usp=sharing