Closed teo-chenglim closed 5 years ago
It seems like only alertmanager and prometheus has the 30 seconds timeout limit. But I am not sure how to edit it
chenglim@chenglim-GL503VM:/faas-netes/chart/openfaas$ grep -in 30 */*
templates/alertmanager-cfg.yaml:18: repeat_interval: 30s
templates/alertmanager-dep.yaml:35: - --timeout=30
templates/alertmanager-dep.yaml:38: timeoutSeconds: 30
templates/alertmanager-dep.yaml:45: - --timeout=30
templates/alertmanager-dep.yaml:48: timeoutSeconds: 30
templates/prometheus-dep.yaml:34: - --timeout=30
templates/prometheus-dep.yaml:37: timeoutSeconds: 30
templates/prometheus-dep.yaml:44: - --timeout=30
templates/prometheus-dep.yaml:47: timeoutSeconds: 30
$ kubectl get configmap/alertmanager-config -n openfaas -o yaml
apiVersion: v1
data:
alertmanager.yml: |
route:
group_by: ['alertname', 'cluster', 'service']sh
group_wait: 5s
group_interval: 10s
repeat_interval: 40s
receiver: scale-up
routes:
- match:
service: gateway
receiver: scale-up
severity: major
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster', 'service']
receivers:
- name: 'scale-up'
webhook_configs:
- url: http://gateway.openfaas:8080/system/alert
send_resolved: true
kind: ConfigMap
metadata:
creationTimestamp: "2018-10-17T01:51:38Z"
labels:
app: alertmanager
name: alertmanager-config
namespace: openfaas
resourceVersion: "9657074"
selfLink: /api/v1/namespaces/openfaas/configmaps/alertmanager-config
uid: 2f87f660-d1af-11e8-a38b-42010a940128
$ kubectl get deploy/alertmanager -n openfaas -o yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2018-10-17T01:51:38Z"
generation: 2
labels:
app: alertmanager
name: alertmanager
namespace: openfaas
resourceVersion: "9657435"
selfLink: /apis/extensions/v1beta1/namespaces/openfaas/deployments/alertmanager
uid: 2f8a166b-d1af-11e8-a38b-42010a940128
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: alertmanager
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: alertmanager
spec:
containers:
- command:
- alertmanager
- --config.file=/alertmanager.yml
- --storage.path=/alertmanager
image: prom/alertmanager:v0.15.0-rc.0
imagePullPolicy: Always
name: alertmanager
ports:
- containerPort: 9093
protocol: TCP
resources:
limits:
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /alertmanager.yml
name: alertmanager-config
subPath: alertmanager.yml
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 40
volumes:
- configMap:
defaultMode: 420
items:
- key: alertmanager.yml
mode: 420
path: alertmanager.yml
name: alertmanager-config
name: alertmanager-config
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2018-12-23T15:01:48Z"
lastUpdateTime: "2018-12-23T15:01:48Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2018-10-17T01:51:38Z"
lastUpdateTime: "2019-01-08T01:01:55Z"
message: ReplicaSet "alertmanager-7c66b6fc5d" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 1
replicas: 1
updatedReplicas: 1
$
Where to edit the "readinessProbe" and "livenessProbe" at templates/alertmanager-dep.yaml without restarting the server? Does it has anything to do with the time out of 30 seconds?
Hi @teo-chenglim,
Thank you for using OpenFaaS.
OpenFaas timeout issue on GKE is 30 sec max. It doesn't seems like a OpenFaas settings but Kubernetes settings. More than 10 users report the same things and always closed with no responses.
I am not sure what you are referring to? Can you please link to these 10 users?
If you would like help with setting timeouts I would suggest you read the troubleshooting guide very carefully and work methodically to make sure your timeouts are set everywhere where they are needed. https://docs.openfaas.com/deployment/troubleshooting/
I know that the timeouts work because we use them for OpenFaaS Cloud for doing container builds which take several minutes.
Example from OpenFaaS Cloud:
helm repo add openfaas https://openfaas.github.io/faas-netes
helm repo update && \
helm upgrade openfaas --install openfaas/openfaas \
--namespace openfaas \
--set basic_auth=true \
--set functionNamespace=openfaas-fn \
--set ingress.enabled=true \
--set gateway.scaleFromZero=true \
--set gateway.readTimeout=300s \
--set gateway.writeTimeout=300s \
--set gateway.upstreamTimeout=295s \
--set faasnetesd.readTimeout=300s \
--set faasnetesd.writeTimeout=300s \
--set gateway.replicas=2 \
--set queueWorker.replicas=2
You then need to set read_timeout
and write_timeout
on each function you build with a Golang duration either with 5s
or 5m
for 5 seconds or 5 minutes.
Alex
@martindekov you recently helped out another user with exactly the same issue, is there anything you can add?
Alex
I'm going to close the issue because I'm not aware of any issues with configuring timeouts and we want to keep the issue tracker for urgent defects or features for the project. Please keep commenting @teo-chenglim and let us know how you get on.
Hello @teo-chenglim !
Can you please remove quotations around the timeouts aka, "300s"
-> 300s
. About whether or not you should put s
behind the values then answer is yes.
For me putting quotations around timeout values I get 504
error, I think this might be your issue.
Try kubectl describe svc <your service name here> -n openfaas-fn
and check if values are present in that part:
...
Environment:
fprocess: python index.py
write_timeout: 60s
read_timeout: 60s
...
Let me know the result, thanks!
I'd also suggest pulling down a later stack. gateway:0.7.8 is nearly a year old now. There have been an awful lot of changes in that time.
Hi all,
Thank you for the replied.
Still doesn't works. I think I tried everything you guy suggested.
$ kubectl get deploy/faas-netesd -n openfaas -o yaml | grep -i timeout -A1
- name: write_timeout
value: 300s
- name: read_timeout
value: 300s
$ kubectl get ingress -n openfaas -o yaml | grep timeout
ingress.kubernetes.io/proxy-connect-timeout: 300s
ingress.kubernetes.io/proxy-read-timeout: 300s
ingress.kubernetes.io/proxy-send-timeout: 300s
ingress.kubernetes.io/upstream-fail-timeout: 300s
nginx.ingress.kubernetes.io/proxy-read-timeout: 600s
nginx.ingress.kubernetes.io/proxy-send-timeout: 600s
$ kubectl get deploy/gateway -n openfaas -o yaml | grep -i timeout -A1
- name: write_timeout
value: 300s
- name: read_timeout
value: 300s
- name: upstream_timeout
value: 240s
$
ubuntu@duckling-849cd6445f-gj7cs:~/function$ time echo "Lets meet today 11:00pm" | python index.py
[{'dim': 'number', 'text': '11', 'start': 16, 'end': 18, 'value': {'value': 11.0}}, {'dim': 'number', 'text': '00', 'start': 19, 'end': 21, 'value': {'value': 0.0}}, {'dim': 'distance', 'text': '11', 'start': 16, 'end': 18, 'value': {'value': 11.0, 'unit': None}}, {'dim': 'distance', 'text': '00', 'start': 19, 'end': 21, 'value': {'value': 0.0, 'unit': None}}, {'dim': 'volume', 'text': '11', 'start': 16, 'end': 18, 'value': {'value': 11.0, 'unit': None, 'latent': True}}, {'dim': 'volume', 'text': '00', 'start': 19, 'end': 21, 'value': {'value': 0.0, 'unit': None, 'latent': True}}, {'dim': 'temperature', 'text': '11', 'start': 16, 'end': 18, 'value': {'value': 11.0, 'unit': None}}, {'dim': 'temperature', 'text': '00', 'start': 19, 'end': 21, 'value': {'value': 0.0, 'unit': None}}, {'dim': 'time', 'text': 'today 11:00pm', 'start': 10, 'end': 23, 'value': {'value': '2019-01-09T23:00:00.000Z', 'grain': 'minute', 'others': [{'grain': 'minute', 'value': '2019-01-09T23:00:00.000Z'}]}}]
real 1m50.534s
user 0m56.060s
sys 0m0.942s
amaris@duckling-849cd6445f-gj7cs:~/function$ env | grep -i timeout
exec_timeout=300s
read_timeout=300s
upstream_timeout=300s
write_timeout=300s
$ time curl --connect-timeout 300 -s https://faas.amaris.ai/function/duckling -d "Lets meet today at 11:00pm"
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>
real 0m30.067s
user 0m0.007s
sys 0m0.007s
RUN curl -sSL https://github.com/openfaas/faas/releases/download/0.9.14/fwatchdog > /usr/bin/fwatchdog \
&& chmod +x /usr/bin/fwatchdog
@martindekov mine error is 502 not 504..
If you are unable to share the repo, could you at least provide the stack.yml
definition of the duckling function, please?
I don't think it will be helpful. But here it is.
provider:
name: faas
gateway: https://endpoint
functions:
duckling:
lang: dockerfile
handler: ./function
image: gcr.io/[project_name]/duckling:latest
labels:
com.openfaas.scale.min: 1
com.openfaas.scale.max: 15
com.openfaas.scale.factor: 50
read_timeout: 300s
write_timeout: 300s
upstream_timeout: 300s
exec_timeout: 300s
environment:
read_timeout: 300s
write_timeout: 300s
upstream_timeout: 300s
exec_timeout: 300s
Thanks. It's looking increasingly as though this may be outside of OpenFaaS.
From the initial detail we can see that the function does return after 60-70 seconds. We can also see that the timeouts within OpenFaaS have been configured to lengths much greater than the 60-70 seconds.
I think there may be value in looking at how nginx is configured or anything in front of that, such as a load balancer, for example.
Hi @rgee0, yes. I tried ingress and nginx deployment but it doesn't help either
$ kubectl get ingress -n openfaas -o yaml | grep timeout
ingress.kubernetes.io/proxy-connect-timeout: 300s
ingress.kubernetes.io/proxy-read-timeout: 300s
ingress.kubernetes.io/proxy-send-timeout: 300s
ingress.kubernetes.io/upstream-fail-timeout: 300s
nginx.ingress.kubernetes.io/proxy-read-timeout: 600s
nginx.ingress.kubernetes.io/proxy-send-timeout: 600s
$ kubectl get deploy/caddy-tls -n openfaas -o yaml | grep -i timeout
ingress.kubernetes.io/proxy-connect-timeout: 300s
ingress.kubernetes.io/proxy-read-timeout: 300s
ingress.kubernetes.io/proxy-send-timeout: 300s
ingress.kubernetes.io/upstream-fail-timeout: 300s
nginx.ingress.kubernetes.io/proxy-read-timeout: 600s
nginx.ingress.kubernetes.io/proxy-send-timeout: 600s
Also try caddy timeouts
$ kubectl exec -n openfaas -it caddy-tls-95f8456b6-fgsw5 -- ash
/www # ps auxf
PID USER TIME COMMAND
1 root 0:02 caddy -agree --conf /Caddyfile
16 root 0:00 ash
20 root 0:00 ps auxf
/www #
/www # cat /Caddyfile
:80 {
status 200 /healthz
basicauth /system {$ADMIN_USER} {$ADMIN_PASSWORD}
basicauth /ui {$ADMIN_USER} {$ADMIN_PASSWORD}
proxy / gateway:8080 {
transparent
}
timeouts 300s ### added this!
errors stderr
tls off
}
/www #
caddy lb is from @stefanprodan 's repo, also open an issue there https://github.com/stefanprodan/openfaas-gke/issues/6
Found GKE default load balancer is 30 seconds. It is not Kubernetes nor Openfaas!
Thank you @alexellis @rgee0 @martindekov
https://cloud.google.com/load-balancing/docs/backend-service#create-backend-service
Edited the load balancer timeout on GCP.
Derek lock
Hi,
It looks like GCP has a 30 second limit on their LoadBalancer as you identified - https://cloud.google.com/load-balancing/docs/https/#timeouts_and_retries
Can you check that you've configured this correctly and reach out to Google support if om doubt?
One thing you can try to test your settings is to use kubectl port-forward and bring the gateway to your local machine to see if the timeout issue is with your GKE LB or the configuration in OpenFaaS.
You could also test locally with minikube. Hope these suggestions may be useful to you.
Alex
OpenFaas timeout issue on GKE is 30 sec max. It doesn't seems like a OpenFaas settings but Kubernetes settings. More than 10 users report the same things and always closed with no responses. I am using Helm for deployment.
Please help to take a look this time and I am providing comprehensive troubleshooting steps and please update troubleshooting steps to include this as needed.
I do understand that find this is to set OpenFaas time out. in facts this is my current settings. Why I put at both location? Because some document say it is out side of environment. I tried with " and with "s" with all combination
Expected Behaviour
Program running inside Kubenetes pods
Kubernetes has a timeout of 30 seconds. I tried many place but still can't find where it is
Current Behaviour
Possible Solution
Steps to Reproduce (for bugs)
Context
Your Environment
faas-cli version
):docker version
(e.g. Docker 17.0.05 ):FaaS-Netes (Helm), nginx ingress
Ubuntu 16.04
Any program run more than 30 seconds breaks.
(https://github.com/openfaas/faas/blob/master/guide/troubleshooting.md) and paste in any other diagnostic information you have:
I have follow this guide more than 20 times.
Possible gateway timeout
Possible Ingress timeout