googlearchive / flashlight

A pluggable integration with ElasticSearch to provide advanced content searches in Firebase.
http://firebase.github.io/flashlight/
756 stars 144 forks source link

Keeping Elastic search synchronized with Firebase in case Flashlight crashes #121

Open georgesxi opened 7 years ago

georgesxi commented 7 years ago

After having indexed in ES all your Firebasedata from a certain path, in a case where the Flashlight indexing worker crashes or becomes non-responsive it appears that an issue is created which obstructs ES to be properly updated while data are deleted from Firebase. Specifically, when a key is indexed and later on is deleted (and the flashlight worker is not active at that moment) it stays in the elastic search index and keeps appearing in the results. The same issue doesn't appear if keys are added during the time flashlight is down due to re-indexing at the start of the app. Any suggestions on how to deal with such a problem? Thanks!

katowulf commented 7 years ago

Obviously, add uptime monitoring and ensure ES stays up. If it's crashing, that sounds like a problem.

If you want a crash-proof approach, you'll want to create a queue of events (adds, updates, deletes) and tweak your Flashlight service to read those instead of listening for events. In this way, if the service is offline, events would still write to the queue and wait for ES to return and process them, providing a lossless approach.

See also #122 and firebase-queue.

georgesxi commented 7 years ago

Thanks for the useful information, I will definitely check on firebase-queue and the addition of the extra functionality in #122 will be useful as well. However, am I right to assume that these solutions will mostly cover situations where the Elastic Search instance becomes unresponsive? My initial question concerned the case where Flashlight stops responding and therefore any data from Firebase that will be deleted during that time won't be properly updated to Elastic Search index creating an inconsistency. Will multiple Flashlight instances running with exact same settings , listening on the same Firebase path also be a valid mitigation of such an issue? (assuming that not all the instances could fail at once)

By saying also I mean that the suggested way will be to write data on firebase and also maintain a queue that keeps all the add/deletions/updates and when each one of them is processed is removed from the queue assuring accuracy, correct?

katowulf commented 7 years ago

No, what the queue allows you to do is to retry failed attempts. So if the Flashlight server is down, they simply accumulate. If the ES server is down, using a queue strategy similar to firebase-queue means that you mark them as an error condition (or unsent) and they can be requeued again.

The ideal approach in my mind would be something like: Process items in the queue. If ES goes down, begin marking them as es_failed. Kick off a process to watch for ES's recovery. When ES recovers, find all es_failed items and reset them to the initial/new state to be processed again.

georgesxi commented 7 years ago

That is a solid solution I believe and will work for both flashlight failure or ES failure. The Multiple flashlight instances suggestion would not cover ES failure and maintaining a queue for every data entry would create too much overhead. By flagging just the failed attempts in the queue is probably the best way to deal with this. Thanks for the insight!