Closed richamishra006 closed 4 months ago
The refinery 2.0.0 helm chart (helm chart version refinery-2.0.0) had a bug where PeerManagement
was local
by default. Helm chart version refinery-2.1.0 fixed this via https://github.com/honeycombio/helm-charts/pull/267. Were you already setting the PeerManagement to Redis?
Can you share your values.yaml?
yes I tried setting the PeerManagement to Redis, but still getting that error while upgrading.
config:
Collection:
AvailableMemory: '2GB'
PeerManagement:
Type: redis
RedisPeerManagement:
Host: 'my-redis-master.redis.cluster.local:6379'
Timeout: 15s
redis:
enabled: false
The only thing I can think of is that in Refinery 2.0.2 we increased the redis scan batch size: https://github.com/honeycombio/refinery/releases/tag/v2.0.2.
I see this Redis instance isn't coming from the helm chart but is expected to be inside the cluster. How are you installing it? Are the IP address in the error message correct IPs? Is this problem happening on a helm install
or a helm upgrade
?
We are facing this issue in EKS cluster and the redis is installed as elasticcache in aws. I replicated the same setup in local and there as well, getting same error. The redis is installed in same network and with 2.0.0., we are not facing connectivity issue. In my local minikube as well, I installed the redis in same cluster and with 2.0.0 it is working perfectly fine. The exact setup is working with 2.0.0. The helm chart is running on 2.0.0 and as soon as I upgrade it by running helm upgrade command, i tried for 2.1.2 and 2.9.0 as well, the error is same.
I think if you replicate the same, you would get this error.
@richamishra006 I was able to reproduce the issue locally only when using an invalid redis peer host, such as 'refinery-redis.default.cluster.local:6379'
instead of 'refinery-redis.default.svc.cluster.local:6379'
. As long as I provided a valid host endpoint I was able to perform an upgrade with no issues. Definitely check that the endpoint you're providing is correct.
Thanks for the quick response @TylerHelmuth . I missed the svc in redis endpoint in my local, after adding it, I am getting this error
$ kubectl logs my-refinery-6f66bfccf4-h2w2f
2024/05/18 03:22:00 maxprocs: Updating GOMAXPROCS=2: determined from CPU quota
time="2024-05-18T03:22:00Z" level=info msg="using identifier from interface" identifier=10.1.1.240 interface=eth0
time="2024-05-18T03:22:00Z" level=error msg="registration failed" err="NOAUTH Authentication required." name="http://10.1.1.240:8081" timeoutSec=10
time="2024-05-18T03:22:00Z" level=error msg="failed to register self with redis peer store" error="NOAUTH Authentication required."
unable to load peers: NOAUTH Authentication required.
However, I verified the endpoint of redis elasticcache is correct in my prod setup. And that's the reason it is working with 2.0.0 I am wondering how it was connecting with redis with incorrect endpoint in my local at 2.0.0 version
@TylerHelmuth I upgraded the refinery helm chart to 2.9.0 and the redis (aws elasticache) connection issue is resolved. However, i am getting errors in refinery pod logs
time="2024-06-05T10:55:22Z" level=error msg="error when sending event" api_host="http://100.64.173.97:8081/" dataset=processor-indexing environment=honeycomb-perf error="got unexpected HTTP status 503: Service Unavailable" roundtrip_usec=111279 status_code=503 trace.span_id=6655e67890000
I searched with the trace.span_id in honeycomb ui, but could find this span id. Could you please help if we are missing anything
This issue is being handled as a support ticket for the operational environment.
I gave a refinery chart running perfectly fine on 2.0.0 version. The redis is installed separately and the hostname is referred in the values.yaml under RedisPeerManagement. As soon as i try to upgrade the chart even to 2.1.2, also tried for 2.9.0 and 2.9.1, it is failing with below error
can someone please help me here. Not sure what am i missing