Open joryirving opened 3 months ago
2024-08-05 07:42:37,655 - client - DEBUG - cache miss for secret 1f312b3d-a2ce-463d-a32a-b11300f48c3d
DEBUG:client:cache miss for secret 1f312b3d-a2ce-463d-a32a-b11300f48c3d
172.17.0.1 - - [05/Aug/2024 07:42:37] "GET /key/volsync-r2-template HTTP/1.1" 200 -
INFO:werkzeug:172.17.0.1 - - [05/Aug/2024 07:42:37] "GET /key/volsync-r2-template HTTP/1.1" 200 -
2024-08-05 07:42:38,641 - client - DEBUG - authenticating client
DEBUG:client:authenticating client
2024-08-05 07:42:38,644 - client - DEBUG - cache hit for secret db34fee6-832c-45cd-9425-b1120155bbef
DEBUG:client:cache hit for secret db34fee6-832c-45cd-9425-b1120155bbef
172.17.0.1 - - [05/Aug/2024 07:42:38] "GET /key/volsync-bucket HTTP/1.1" 200 -
INFO:werkzeug:172.17.0.1 - - [05/Aug/2024 07:42:38] "GET /key/volsync-bucket HTTP/1.1" 200 -
2024-08-05 07:42:38,652 - client - DEBUG - authenticating client
DEBUG:client:authenticating client
2024-08-05 07:42:38,654 - client - DEBUG - cache hit for secret 4cc440b6-9098-48be-ad27-b111012ab903
DEBUG:client:cache hit for secret 4cc440b6-9098-48be-ad27-b111012ab903
172.17.0.1 - - [05/Aug/2024 07:42:38] "GET /key/volsync-minio-template HTTP/1.1" 200 -
INFO:werkzeug:172.17.0.1 - - [05/Aug/2024 07:42:38] "GET /key/volsync-minio-template HTTP/1.1" 200 -
2024-08-05 07:42:39,844 - client - DEBUG - authenticating client
DEBUG:client:authenticating client
2024-08-05 07:42:39,846 - client - DEBUG - cache hit for secret 63e5956e-f276-4a25-a5c0-b11300f4075c
DEBUG:client:cache hit for secret 63e5956e-f276-4a25-a5c0-b11300f4075c
172.17.0.1 - - [05/Aug/2024 07:42:39] "GET /key/cloudflare HTTP/1.1" 200 -
INFO:werkzeug:172.17.0.1 - - [05/Aug/2024 07:42:39] "GET /key/cloudflare HTTP/1.1" 200 -
2024-08-05 07:42:39,855 - client - DEBUG - authenticating client
DEBUG:client:authenticating client
2024-08-05 07:42:39,857 - client - DEBUG - cache hit for secret 1f312b3d-a2ce-463d-a32a-b11300f48c3d
DEBUG:client:cache hit for secret 1f312b3d-a2ce-463d-a32a-b11300f48c3d
172.17.0.1 - - [05/Aug/2024 07:42:39] "GET /key/volsync-r2-template HTTP/1.1" 200 -
INFO:werkzeug:172.17.0.1 - - [05/Aug/2024 07:42:39] "GET /key/volsync-r2-template HTTP/1.1" 200 -
2024-08-05 07:42:41,710 - client - DEBUG - authenticating client
DEBUG:client:authenticating client
2024-08-05 07:42:41,712 - client - DEBUG - cache miss for secret 0637e5ef-bf85-4d7c-bdac-b1130181c863
DEBUG:client:cache miss for secret 0637e5ef-bf85-4d7c-bdac-b1130181c863
172.17.0.1 - - [05/Aug/2024 07:42:41] "GET /key/authentik HTTP/1.1" 200 -
INFO:werkzeug:172.17.0.1 - - [05/Aug/2024 07:42:41] "GET /key/authentik HTTP/1.1" 200 -
2024-08-05 07:42:43,943 - client - DEBUG - authenticating client
DEBUG:client:authenticating client
2024-08-05 07:42:43,945 - client - DEBUG - cache miss for secret 691981fb-12af-4838-be77-b113016bc6cc
DEBUG:client:cache miss for secret 691981fb-12af-4838-be77-b113016bc6cc```
@joryirving thanks for the issue, and for enabling more verbose logging.
Could you provide some more context to your update? Clearly we could do with improving the logging here, however we don't have much to work with here without any further context.
Examples of some useful information to go along with the log output you've shown here would be:
Sorry, @rippleFCL asked me to open an issue on discord a few days ago, and I turned on debugging at the time, and this is the first time the issue has reoccurred.
1) No hanging.
2) I have it running as a docker container on my NAS, for 2 k8s clusters to access via external secrets.
3) Eventually, external secrets will be unable to sync due to timeouts. My clustersecretstore
shows valid. This happens regardless of where the container is running (docker, on my NAS, or as a k8s pod in cluster).
4) My fix is restarting the container.
4) I cannot reliably reproduce this.
I shall improve the logging on bws-cache, there is not enough details to fully know what's going on. Can you share the timeouts/error codes in external secrets?
We also need to figure out if it truly is bws-cache hanging or if it's external secrets breaking. When It hangs could you run the curl commands in the readme and include the output from it(censoring any sensitive info if it returns a secret ofc).
Mainly want to see the query commands first. Then invalidate the cache. then rerun the query commands. This should give us some important info too
I've been seeing this too - worked around it with a cronjob that restarts the BWS deployment occasionally and haven't given it much more thought. I'll disable that and attempt to gather diagnostics the next time it starts failing secret lookups.
I'm in a similar boat to joryirving - it works until it doesn't; the precise moment that "until it doesn't" occurs seems somewhat unpredictable, but usually for me takes at least a day or two of uptime; once it starts happening, it's a problem until the BWS-cache container is restarted.
My environment is a k8s environment on talos, managed by fluxcd. [helmrelease link]
❯ k describe externalsecret external-dns-unifi-secret -n network --context utility
Name: external-dns-unifi-secret
Namespace: network
Labels: app.kubernetes.io/name=external-dns-unifi
kustomize.toolkit.fluxcd.io/name=external-dns-unifi
kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations: <none>
API Version: external-secrets.io/v1beta1
Kind: ExternalSecret
Metadata:
Creation Timestamp: 2024-09-09T20:47:17Z
Generation: 1
Resource Version: 8454
UID: 6ea38394-fc26-4d4a-ab75-31e26b7d0a18
Spec:
Data From:
Extract:
Conversion Strategy: Default
Decoding Strategy: None
Key: unifi
Metadata Policy: None
Refresh Interval: 15m
Secret Store Ref:
Kind: ClusterSecretStore
Name: bitwarden-secrets-manager
Target:
Creation Policy: Owner
Deletion Policy: Retain
Name: external-dns-unifi-secret
Template:
Data:
EXTERNAL_DNS_UNIFI_PASS: {{ .EXTERNAL_DNS_UNIFI_PASS }}
EXTERNAL_DNS_UNIFI_USER: {{ .EXTERNAL_DNS_UNIFI_USER }}
Engine Version: v2
Merge Policy: Replace
Status:
Conditions:
Last Transition Time: 2024-09-09T20:48:47Z
Message: could not get secret data from provider
Reason: SecretSyncedError
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning UpdateFailed 106s external-secrets endpoint gave error 504 Gateway Time-out
I tried to curl, but it just hangs. Based on the pod logs, they container "froze". The last entry in the logs was cache miss secret...
What's interesting is that the clustersecretstore
appears to still be valid, but all the externalsecrets
are SecretSyncedError
❯ curl https://bws-cache.domain.tld/key/<key>
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>openresty</center>
</body>
</html>
Once again, restarting the container resolved the issue. The logs appeared to be ~4 days out of date, but I haven't changed any values, so I didn't noticed until now.
right, sorry for the delay. ive done some digging and seams to be in the auth method, we are doing a restructure for the bwsSDK 1.0.0 release. once ive finished reworking that and pushed a new version of bwsc. might be worth trying that out and seeing if there is a hanging problem
I run the bws-cache on an external machine via docker, and every so often, it stops responding, giving timeout errors. The logs don't seem to indicate any issues in the container itself, but restarting the container resolves the issue.
If you let me know how to enable debug, or more verbatim logging, I can try to provide more information.