Closed vovayartsev closed 3 years ago
Thanks for reporting the issue.
At the first sight, the issue isn't related to Scale up. To further investigate the issue, we need more information. Can you please post /data/rudderstack/error_store.json on the crashing pod.
Thank you Chandu.
Reproduced and captured: https://gist.github.com/vovayartsev/ec2a6ff48a0ff328f5be3f4cfd0da127
but the error looks slightly different this time: https://gist.github.com/vovayartsev/c2f19da4de42bbaf8cd9ece7c9b4db5b
I reproduce it easily with 2 replicas on a fresh db (it enters CrashLoopBackOff after ~200-300 batches), but it works really stable as 1 replica.
Closing this issue as my setup was incorrect. I've been testing with RDS and both replicas were connected to the same RDS database.
As pointed out in Slack:
Yeah, that would not work. Both the servers cannot write to same set of tables. If you are using RDS try connecting each server to different database inside RDS
Environment: AWS's Kubernetes (via HELM chart),
db.t3.large
Postgres via RDS, 2GB RAM and 8vCPU per replica Docker Image:rudderlabs/rudder-server:14102020.053158
Steps to reproduce:
backendReplicaCount: 2
in Helm chart configuration, as suggested hereExpected: doubled throughput Actual: one of the replicas entered
Crash Loop Back-Off
with the following log message The destination (Apache Kafka) continued handling the messages already in the queue.Load-testing was done via Apache Benchmark from Kubernetes:
see rudderstack-ab.json