wmde / wikibase-release-pipeline

BSD 3-Clause "New" or "Revised" License
45 stars 33 forks source link

Elasticsearch stops indexing #325

Closed rchavez-neu closed 2 years ago

rchavez-neu commented 2 years ago

Hello,

Elasticsearch appears to have stopped updating its index after a period. I'm wondering if this is similar to the WDQS issue outlined here (https://www.mediawiki.org/wiki/Wikibase/FAQ/en#Why_doesn't_the_query_service_update?).

Is there a similar method one could apply to force ElasticSearch to re-index/update it's index and resume indexing?

I notice the following in the wikibase Docker image var/www/html/extensions/CirrusSearch/README file

_next bootstrap the search index by running: php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip php $MW_INSTALLPATH/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse Note that this can take some time. For large wikis read "Bootstrapping large wikis" below.

I'm wondering if running these would be an appropriate fix or if there is a different solution for this issue.

Not sure if this helps, but I'm seeing a lot of the following errors in the job runner logs - attempts by cirrusSearchElasticaWrite, failures, and then drops :

022-02-24 15:27:12 cirrusSearchElasticaWrite Special: method=sendData arguments=["general","array(...)"] cluster=default createdAt=1645716431 errorCount=3 retryCount=0 requestId=281580d187a1b4cb215432e5 namespace=-1 title= (id=266,timestamp=20220224152712) t=66 error=ElasticaWrite job failed: Requeued

2022-02-24 15:27:12 cirrusSearchElasticaWrite Special: method=sendData arguments=["general","array(...)"] cluster=default createdAt=1645716431 errorCount=4 retryCount=0 requestId=281580d187a1b4cb215432e5 namespace=-1 title= (id=267,timestamp=20220224152712) STARTING

2022-02-24 15:27:12 cirrusSearchElasticaWrite Special: method=sendData arguments=["general","array(...)"] cluster=default createdAt=1645716431 errorCount=4 retryCount=0 requestId=281580d187a1b4cb215432e5 namespace=-1 title= (id=267,timestamp=20220224152712) t=62 error=ElasticaWrite job failed: Dropped

Also seeing a ton of the following in the Elastic Search image logs:

2022-02-24T15:06:00,480][WARN ][o.e.d.c.ParseField ] [iLkW6ri] Deprecated field [_retry_on_conflict] used, expected [retry_on_conflict] instead [2022-02-24T15:06:00,553][WARN ][o.e.d.c.ParseField ] [iLkW6ri] Deprecated field [_retry_on_conflict] used, expected [retry_on_conflict] instead

Image Configuration

WIKIBASE_IMAGE_NAME=wikibase/wikibase:1.35.4-wmde.2 WDQS_IMAGE_NAME=wikibase/wdqs:0.3.40-wmde.2 WDQS_FRONTEND_IMAGE_NAME=wikibase/wdqs-frontend:wmde.2 ELASTICSEARCH_IMAGE_NAME=wikibase/elasticsearch:6.5.4-wmde.2 WIKIBASE_BUNDLE_IMAGE_NAME=wikibase/wikibase-bundle:1.35.4-wmde.2 QUICKSTATEMENTS_IMAGE_NAME=wikibase/quickstatements:wmde.2 WDQS_PROXY_IMAGE_NAME=wikibase/wdqs-proxy:wmde.2 MYSQL_IMAGE_NAME=mariadb:10.3

Thanks!

rchavez-neu commented 2 years ago

Ticket can be closed. I figured out the problem. In case it is useful in the future here's what happened and the fix:

The server that docker is running on seems to have been low on disk space at some point in the last 60 days and caused Elastic Search to crash. As a result of the crash, Elastic Search indexes where locked in read-only mode, which caused any re-index attempts to fail.

Here’s what I ended up doing to resolve this:

curl -XPUT -H "Content-Type: application/json" http://<server-name>:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

php extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki=<db name> --startOver --indexType=general php extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki=<db name> --startOver --indexType=content php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki=<db name>

All that got Elastic Search working again. Will have to keep an eye on it to make sure indexing continues, though....

Lesson learned: don't skimp on disk space for wikidata.

Thanks for your patience!

addshore commented 2 years ago

Thanks for the writeup, and glad you found the issue!

anchardo commented 1 year ago

Thank you so much for documenting in detail the solution you found @rchavez-neu! I had the same problem and it saved me a lot of time. If other people are struggling like me with the 'server-name' to be inserted here: http://<server-name>:9200 Try with http://127.0.0.1:9200 once you're in the Elasticsearch container, it worked for me.

ernstki commented 6 months ago

Telling Elasticsearch to set '{"index.blocks.read_only_allow_delete": null}' as prescribed above, then running php maintenance/runJobs.php was sufficient for me to clear the 1200+ jobs in the queue for a small internal wiki. I noticed something was wrong and started troubleshooting when pages that clearly should've been in the search index weren't.

I know this isn't the place to report bugs or feature requests for CirrusSearch itself, but this issue is a top result in a web search for "Cirrus search ElasticaWrite job failed."

What's the consensus among the folks here: locked indexes feels like something CirrusSearch could and should detect and report, no?

See also