ncbo / ncbo_cron

Jobs that run on a regular basis in the NCBO infrastructure
Other
2 stars 6 forks source link

Create a script that monitors for discrepancies between master data and search index #34

Closed mdorf closed 4 years ago

mdorf commented 4 years ago

When ncbo/bioportal-project#165 was implemented, we had discovered that the search index is not in sync with the master data (coming from a triple store) for multiple ontologies. We need a script that runs periodically (nightly?) and verifies that the search index contains ALL ontology data.

graybeal commented 4 years ago

Looks like these are the baddies today. I've gathered the following info from their summary pages, since the Admin page isn't updating. Looks like BMO, GENO, ADMO, CHIRO, and ORDO_PL might be worth at the logs before re-indexing, to see if it's obvious what went wrong.

For the rest of them (all 'ERROR INDEXED'), re-indexing might just repeat the original problem, but at least we'll know the scope of work that's left.

~/Downloads/index-synchronizer.log:2324: I, [2020-08-30T21:56:18.242463 #68167]  INFO -- : Ontology xxx is missing classes from the index. Queued for re-indexing.

BMO: 0.5 (Uploaded, Error Annotator)    11/03/2014
PDRO: unknown (Parsed, Metrics, Error Annotator, Error Indexed, Error Indexed Properties)   02/25/2020
PHAGE: 5.0 (Parsed, Metrics, Annotator, Error Indexed)  05/02/2016
GENO: unknown (Parsed, Annotator, Error Obsolete)   03/08/2020
FOODON: 0.4.5 (Parsed, Metrics, Error Indexed)  06/21/2020
ADMO: beta (Uploaded, Error Rdf Labels) 10/17/2018
CHIRO: unknown (Parsed, Indexed, Metrics, Annotator)    11/23/2015
OCHV: 1 (Parsed, Metrics, Annotator, Error Indexed) 01/21/2016
ORDO_PL: 3.0 (Uploaded, Error Rdf Labels)   07/06/2020
ABD: unknown (Parsed, Metrics, Annotator, Error Indexed, Error Diff)    09/13/2016
UPHENO: unknown (Parsed, Metrics, Annotator, Error Indexed) 01/24/2020
EO: 1.0 (Parsed, Metrics, Annotator, Error Indexed) 10/12/2015
mdorf commented 4 years ago

Investigation report attached. indexing_error_report.txt