Open zerowebcorp opened 4 years ago
Funny, I just replaced the AWS VPC CNI with Calico and had the same effect… Metrics don't work and the whole thing is generally more sluggish than before. It worked before so it really has to do something with CNIs.
Lens 3.5.2 on macOS
Exactly the same with EKS and Cilium. No metrics and a lot slower than before.
Lens 3.5.2 on Windows
Yes, after a few days using Lens with a EKS+CNI cluster it's pretty close to unusable actually :(
I wonder if it's so slow because it tries to always fetch some metrics which it can't?
Is it in overlay mode or the EKS standard CNI mode (every Pod get's a VPC routable adress)? Secondary private IPs on your worker nodes mean you use the EKS CNI mode.
No it's a full replacement with overlay network. Pods get some 192.168.* addresses.
same issue using Lens 3.5.3 with EKS using AWS CNI + Calico
The issue still exists in 3.6
Problem still persist in 3.6.8
Workaround: just edit the StatefulSet of prometheus and add hostNetwork: true
to the spec section
...
spec:
replicas: 1
selector:
matchLabels:
name: prometheus
template:
metadata:
creationTimestamp: null
labels:
name: prometheus
spec:
volumes:
- name: config
configMap:
name: prometheus-config
defaultMode: 420
- name: rules
configMap:
name: prometheus-rules
defaultMode: 420
initContainers:
- name: chown
image: 'docker.io/alpine:3.9'
command:
- chown
- '-R'
- '65534:65534'
- /var/lib/prometheus
resources: {}
volumeMounts:
- name: data
mountPath: /var/lib/prometheus
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
containers:
- name: prometheus
image: 'docker.io/prom/prometheus:v2.17.2'
args:
- '--web.listen-address=0.0.0.0:9090'
- '--config.file=/etc/prometheus/prometheus.yaml'
- '--storage.tsdb.path=/var/lib/prometheus'
- '--storage.tsdb.retention.time=2d'
- '--storage.tsdb.retention.size=5GB'
- '--storage.tsdb.min-block-duration=2h'
- '--storage.tsdb.max-block-duration=2h'
ports:
- name: web
hostPort: 9090
containerPort: 9090
protocol: TCP
resources:
requests:
cpu: 100m
memory: 512Mi
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: rules
mountPath: /etc/prometheus/rules
- name: data
mountPath: /var/lib/prometheus
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: prometheus
serviceAccount: prometheus
hostNetwork: true
securityContext: {}
...
I confirm that problem still exists in 4.1.2 version. Using hostNetwork for prometheus it's not a problem but I also noticed that lens works slower with EKS + calico CNI. Why does the CNI have impact for lens performance? Does anyone know?
I've had issues with calico and EKS as well. I don't have a clue why it happens so but it eventually forced me reverting back to the VPC CNI. Ever since I did that everything works fine. I know it's not much of an input, just wanted to share that custom CNIs have problems on EKS and it's very much reproducible.
@PatTheSilent Could you tell me what kind of issues did you have with calico? I'm asking because I didn't noticed any issues besides these related with lens. I will appreciate if you share some details.
@randrusiak from what I could gather what I was experiencing had a chance of happening. I've mostly had issues with cross az traffic and connecting to stuff outside the cluster, like an RDS database. I'm not excluding I misconfigured something but I've spend quite some time on ENI adjustments, Security Groups, IAM permissions, calico docs and their GitHub issues to kinda be safe saying I didn't mess it up.
Edit: oh, and the fact that basically no admission webhooks or in fact any other work because they're in a completely different network than the control plane and you have to go and manage ports and hostNetworks FOR EACH AND EVERY DAMN EXTERNAL COMPONENT. Calling it a PITA is an understatement.
@PatTheSilent Maybe they improved calico or you misconfigured something as you said because I tested calico on my EKS cluster and everything working as expected. Of course, there is still a need to use hostNetwork for services that should be visible for the control plane such us metrics etc.
Is this still an issue for 4.1.4?
Hi, Just leawe aws-node daemonset on plase. And setup calico-node too(calico manifest from the AWS docs). You will get working calico policy and possible EKS controll plane connections to the worker nodes. Calico docummentation little wrong. You shouldn't delete aws-node.
чт, 25 мар. 2021 г., 21:56 Sebastian Malton @.***>:
Is this still an issue for 4.1.4?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lensapp/lens/issues/561#issuecomment-807377425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUL62HC2TE2EKAO3WJ2ILTFOIPNANCNFSM4ORRDBUQ .
Okay thanks, so this sounds like it is resolved.
Hi, Just leawe aws-node daemonset on plase. And setup calico-node too(calico manifest from the AWS docs). You will get working calico policy and possible EKS controll plane connections to the worker nodes. Calico docummentation little wrong. You shouldn't delete aws-node.
Well, if you only care about the Policy, that's okay. BUT there are cases in which you want Calico CNI instead of AWS CNI. AWS CNI will consume IPs of your AWS Subnets per Pod.
I mean calico itself tells you about those options: https://docs.projectcalico.org/getting-started/kubernetes/managed-public-cloud/eks
I don't use Lens anymore so I can't tell if it's solved. But this ticket is especially about Calico CNI on EKS. And "Don't use Calico CNI" is not a valid resolution ;)
apply a (possibly modified) version of the manifest below, and configure lens to point to "prometheus/haproxy:9090":
---
# haproxy config
apiVersion: v1
kind: ConfigMap
metadata:
name: haproxy
namespace: prometheus
data:
haproxy.cfg: |+
global
log /dev/log local0
log /dev/log local1 notice
daemon
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend haproxynode
bind *:9090
mode http
default_backend backendnodes
backend backendnodes
mode http
##### your prometheus SERVICE
server prometheus prometheus-foobar.prometheus.svc.cluster.local:9090 check
##### /your prometheus SERVICE
---
# haproxy for proxying to prometheus. key points:
#
# dnsPolicy: ClusterFirstWithHostNet
# hostNetwork: true
#
# also, this is outside the reach of the operator.
#
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: haproxy
namespace: prometheus
spec:
selector:
matchLabels:
app: prometheus-proxy
template:
metadata:
labels:
app: prometheus-proxy
spec:
containers:
- image: haproxy
imagePullPolicy: Always
name: haproxy
volumeMounts:
- mountPath: /usr/local/etc/haproxy
name: vol1
dnsConfig: {}
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
restartPolicy: Always
volumes:
- configMap:
defaultMode: 0777
name: haproxy
optional: false
name: vol1
---
# service resource pointing to the
# haproxy. this service goes into
# the manual lens configuration
apiVersion: v1
kind: Service
metadata:
name: haproxy
namespace: prometheus
spec:
clusterIP: None
type: ClusterIP
ports:
- name: prometheus
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus-proxy
Describe the bug The metrics feature is not available when on EKS cluster with weavenet as the CNI used. This works fine on bare metal installation. After removing the AWS CNI from the EKS cluster, the installation of the Metrics server requires the networking to change to HostNetwork = true and communication IP = Internal. I have noticed that Lens uses its own Metrics server and Prometheus so there may be some additional tweaks required.
To Reproduce Steps to reproduce the behavior:
Expected behavior Lens UI shows the metrics
Screenshots Lens UI doesn't show metrics and complains " Metrics not available at the moment"
Environment (please complete the following information):