As part of #2870, we discussed the approach of running elasticsearch reindexing on a cron schedule to avoid any end-user impact. This approach is imperfect: reindexing only needs to be done when migrations are added to a particular set of models (search_indexes.models), and even then only for that model. Reindexing on any cron schedule is inefficient and leaves the door open to a failure or partial success when we don't need to take that risk.
To address this, create another cron or post-deploy task that
Checks if a reindexing is needed by checking if migrations to any search_indexes.models have been recently applied
and, either:
Publishes a flag to indicate whether a reindex is required. Update #2881 to use this flag to determine whether or not to perform a reindex. A "better" implementation would include the ability to determine individual models migrations have been created for and only reindex those tables.
or; notifies a system administrator that a manual reindex is required
Acceptance Criteria:Create a list of functional outcomes that must be achieved to complete this issue
[ ] Create the cron or post-deploy task to update a "needs reindexing" flag
[ ] Create a notification OR update #2881 to utilize to flag created by the previous task
[ ] Testing Checklist has been run and all tests pass
[ ] README is updated, if necessary
Tasks:Create a list of granular, specific work items that must be completed to deliver the desired outcomes of this issue
[ ] Discuss cron or post-deploy task for flag-setting
[ ] Discuss manual vs automated reindexing (notification vs cron)
[ ] Create the celery task for flag-setting
[ ] Create the notification or update existing cron PR for reindexing
[ ] Run Testing Checklist and confirm all tests pass
Notes:Add additional useful information, such as related issues and functionality that isn't covered by this specific issue, and other considerations that will be helpful for anyone reading this
PR #2881 implements a cron job for elastic reindexing
including usage of --parallel, --use-alias flags
routes all log output to /dev/null - nothing useful is generated by the library logs, silencing all log output is a large performance gain
includes a 10s wait between each model being reindexed - to allow elastic instance to "recover" in between sets
Open Questions:Please include any questions or decisions that must be made before beginning work or to confidently call this issue complete
Cron or post-deploy task?
Manual vs automated reindexing (notification or cron)?
Description:
As part of #2870, we discussed the approach of running elasticsearch reindexing on a cron schedule to avoid any end-user impact. This approach is imperfect: reindexing only needs to be done when migrations are added to a particular set of models (
search_indexes.models
), and even then only for that model. Reindexing on any cron schedule is inefficient and leaves the door open to a failure or partial success when we don't need to take that risk.To address this, create another cron or post-deploy task that
search_indexes.models
have been recently appliedAcceptance Criteria: Create a list of functional outcomes that must be achieved to complete this issue
Tasks: Create a list of granular, specific work items that must be completed to deliver the desired outcomes of this issue
Notes: Add additional useful information, such as related issues and functionality that isn't covered by this specific issue, and other considerations that will be helpful for anyone reading this
--parallel
,--use-alias
flags/dev/null
- nothing useful is generated by the library logs, silencing all log output is a large performance gainOpen Questions: Please include any questions or decisions that must be made before beginning work or to confidently call this issue complete