rwynn / monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
https://rwynn.github.io/monstache-site/
MIT License
1.28k stars 181 forks source link

Connections die and monstache stop working #108

Closed mjacquin closed 5 years ago

mjacquin commented 6 years ago

Hello and thank you rwynn for the great work.

Sometimes connection to ES or mongo is lost and monstache just hang. The error: https://imgur.com/a/DvcgGPA

It doesn't seem to attempt to reconnect and it stop working. The http server is still working so I can't detect it crashed with AWS healthcheck. I have 2 instances of monstache running in cluster on different servers but it didn't help :(

Here is my config

mongo-url = "mongodb://hostA:27017,hostB:27017/admin?replicaSet=RS-mewoprodv2-0&ssl=true"
elasticsearch-urls = [
  "https://xxxxx:9243"
]
elasticsearch-user = "xxxx"
elasticsearch-password = "xxxx"
elasticsearch-retry = true
resume = true
gzip = true
namespace-regex = "mewo.(tracks|albums|catalogs|playlists)"
cluster-name = "mewo"
enable-http-server = true
stats = true
index-stats = true
stats-index-format = "monstache.stats"
elasticsearch-client-timeout = 10
mongo-pem-file = "mongo-ca-cert.pem"
mongo-validate-pem = true

[mongo-session-settings]
socket-timeout = 10
sync-timeout = 10

[gtm-settings]
buffer-size = 128
buffer-duration = "500ms"

[[script]]
namespace = "mewo.tracks"
path = "transform/tracks.js"

[[script]]
namespace = "mewo.albums"
path = "transform/albums.js"

[[script]]
namespace = "mewo.catalogs"
path = "transform/catalogs.js"

[[script]]
namespace = "mewo.playlists"
path = "transform/playlists.js"
rwynn commented 6 years ago

Hi I will take a look into this. In the meantime can you try with the latest monstache version if you hadn’t already. Also try removing all timeout lines in your config. Thanks.

mjacquin commented 6 years ago

I'm already running the latest version. I will try removing the timeout

rwynn commented 5 years ago

@mjacquin The latest version of monstache has a fix related to cluster mode. The monstache process would not properly resume tailing the oplog if it had been paused in the cluster and later activated (managed to grab the cluster lock). This has been fixed. The could be related to this issue.