sul-dlss-deprecated / dor_indexing_app

An indexing API for Stanford's Digital Object Repository
https://sul-dlss-deprecated.github.io/dor_indexing_app/
Apache License 2.0
0 stars 2 forks source link

rolling_index: improve performance #1081

Closed ndushay closed 10 months ago

ndushay commented 10 months ago

Why was this change made? 🤔

See https://docs.google.com/document/d/1B61r-E9v2WhYQ_ABP-RTX61xfbEQ25H8Sm0Wwu5zENA

This improves the performance of the rolling reindexer, and also gives us a few more breadcrumbs in its logs and easier tweaking going forward.

It is strongly recommended the changes in PR sul-dlss/sul-solr-configs/pull/287 are applied to the relevant Solr indexes before deploying this change. (they have been applied to qa and stage and prod as of this writing.)

How was this change tested? 🤨

In qa, first with batch size of 50 and then batch size of 500. In stage, with batch size of 500. In prod, with batch size of 500

In all cases, the new settings have resulted in NOT sending duplicate indexing requests to Solr. Essentially, soft commits are the bomb!

(compared druids and timestamps in rolling_index log with http://sul-solr-prod-h.stanford.edu/solr/argo_qa/select?q=*:*&facet.range=timestamp&f.timestamp.facet.range.start=NOW%2FDAY-90DAYS&f.timestamp.facet.range.end=NOW&f.timestamp.facet.range.gap=%2B1DAY&rows=501&fl=id,timestamp&facet.field=timestamp&wt=xml&sort=timestamp%20asc)

mjgiarlo commented 10 months ago

@ndushay 💬

It seems I should make an issue in this app for this, as neither of you seems to have read the google doc.

I was in the process of reading the doc, so no need IMO