liquidinvestigations / node

Deploy Liquid Investigations on Nomad
https://github.com/liquidinvestigations/docs/wiki
MIT License
6 stars 6 forks source link

Multihost es #320

Closed k-jell closed 2 years ago

k-jell commented 2 years ago

closes CRJI/EIC#752

gabriel-v commented 2 years ago

In a Multi-Node configuration, Nomad randomly sends SIGINT to our load balancers:

2021/11/24 09:17:09 [INFO] Caught SIGINT. Exiting
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9993: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9992: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9995: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9994: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9996: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9991: use of closed network connection
2021/11/24 09:17:09 [INFO] Caught SIGINT. Exiting
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9993: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9992: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9995: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9994: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9996: use of closed network connection
2021/11/24 09:17:09 [FATAL] accept tcp [::]:9991: use of closed network connection

We can't use this in production. I don't know why it's not happening on my QA cluster (made of 2 machines), but I was able to reproduce it in:

To fix this, we would need to upgrade to the latest Nomad version, but they patch out our cluster hacks and change the template language with a number of breaking changes. I don't think the upgrade is worth, considering we're migrating off this platform.

As stated in the issue, I'm closing this in favor of the future Kubernetes migration, where we will use operators to configure the ES cluster.