Closed chdorner closed 7 years ago
Thanks @chdorner. Re: https://blog.codecentric.de/en/2014/09/elasticsearch-zero-downtime-reindexing-problems-solutions/, what is the trigger, in our case, for a reindex?
@judell so far we've had to do it when we changed the index mapping. But it's a good idea to periodically re-index into a new index because even though we deleted documents from Elasticsearch, it doesn't actually remove documents from its own Lucene shards, but just marks them as deleted.
We're in a good place here thanks to @chdorner's tireless efforts. We can now do an online reindex with no downtime in ~15m.
That is completely amazing. Thank you!
With recent changes to the re-index code we lost the ability to re-index all annotations without stopping any writes to the index during that time. A full re-index currently takes around 2 hours. So suspending writes to the index is a terrible user experience, as users will think that the annotation failed to save, even though it did not.
I've been thinking about ways to re-index without downtime and after researching solutions on Friday and talking to @nickstenning we came up with the following:
During a re-index:
There are two operations that we need to be careful about: update and delete.
Problem with update:
Problem with delete:
The solution:
{"deleted": true}
.op_type=create
]() that we never override an annotation with wrong data (or re-create a deleted one)Most of these ideas are from: https://blog.codecentric.de/en/2014/09/elasticsearch-zero-downtime-reindexing-problems-solutions/
Done when:
memex.search.index.delete
marks annotations as deleted (hypothesis/h#4242).op_type=create
and makes sure that errors are handled (op_type=create
related errors should be ignored, hypothesis/h#4245).setting
table) (hypothesis/h#4243, hypothesis/h#4249).h.indexer.add_annotation
will write to both indices when new index name setting is configured (hypothesis/h#4250)