Closed k-jell closed 2 years ago
In a Multi-Node configuration, Nomad randomly sends SIGINT
to our load balancers:
2021/11/24 09:17:09 [INFO] Caught SIGINT. Exiting
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9993: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9992: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9995: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9994: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9996: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9991: use of closed network connection
2021/11/24 09:17:09 [INFO] Caught SIGINT. Exiting
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9993: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9992: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9995: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9994: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9996: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9991: use of closed network connection
We can't use this in production. I don't know why it's not happening on my QA cluster (made of 2 machines), but I was able to reproduce it in:
ieftin-1 ieftin-2 ieftin-3
cluster Kjell was working on.To fix this, we would need to upgrade to the latest Nomad version, but they patch out our cluster
hacks and change the template language with a number of breaking changes. I don't think the upgrade is worth, considering we're migrating off this platform.
As stated in the issue, I'm closing this in favor of the future Kubernetes migration, where we will use operators to configure the ES cluster.
closes CRJI/EIC#752