oaregithub / oare_sql

1 stars 0 forks source link

Generate list of SQL tests for data integrity to periodically run #14

Open edstratford opened 1 year ago

edstratford commented 1 year ago

Let's start with old issues, and gather all the queries that we can to run. List first, then implementation of the series of queries with reports.

For example: SELECT * FROM person WHERE type = 'person' AND (relation_name_uuid NOT IN (SELECT uuid FROM dictionary_word) OR name_uuid NOT IN (SELECT uuid FROM dictionary_word));

Checks for: discourseUnit or epigraphicUnit that has parent_uuid

edstratford commented 1 year ago

Gertrudius:

identifies texts in text_discourse whose row count does not match their highest obj_in_text

SELECT td.text_uuid, count1, count2 FROM text_discourse AS td LEFT JOIN (SELECT td1.text_uuid, COUNT(td1.text_uuid) AS count1, td2.count2 FROM text_discourse AS td1 INNER JOIN (SELECT MAX(obj_in_text) AS count2, text_uuid FROM text_discourse GROUP BY text_uuid) AS td2 ON td1.text_uuid = td2.text_uuid GROUP BY td1.text_uuid) AS td0 ON td.text_uuid = td0.text_uuid WHERE count1 != count2 GROUP BY td.text_uuid;

identifies texts in text_epigraphy whose row count does not match their highest object_on_tablet

SELECT td.text_uuid, count1, count2 FROM text_epigraphy AS td LEFT JOIN (SELECT td1.text_uuid, COUNT(td1.text_uuid) AS count1, td2.count2 FROM text_epigraphy AS td1 INNER JOIN (SELECT MAX(object_on_tablet) AS count2, text_uuid FROM text_epigraphy GROUP BY text_uuid) AS td2 ON td1.text_uuid = td2.text_uuid GROUP BY td1.text_uuid) AS td0 ON td.text_uuid = td0.text_uuid WHERE count1 != count2 GROUP BY td.text_uuid

edstratford commented 1 year ago

Gertrudius Early May 2023:

EDIT: I've created a new procedure called find_iteration_errors that runs a series of sub-procedures to find discrepancies in the ordering of iterative values of text_epigraphy and text_discourse. It does so by using cursors to pull in the relevant values from those tables and compare current and previous iterative values. This means that it will spot any situation in which the preceding iterative value is not equal to the current iterative value+1, meaning that it should catch any duplicate values, as well as any gaps in numbering. You can use it by running:

CALL find_iteration_errors();

It makes use of the below sub-procedures, but they shouldn't be run individually. find_obj_in_text_iteration_issues; find_word_on_tablet_iteration_issues; find_child_num_iteration_issues; find_object_on_tablet_iteration_issues; find_char_on_tablet_iteration_issues find_char_on_line_iteration_issues;

EDIT2: THE ABOVE PROCEDURE NOW SEEMS TO BE WORKING There seems to be something in our parameter group for the server that causes CALL find_iteration_errors(); to hang when it includes all of the sub-procedures. I've confirmed that all the sub-procedures are working as they should, and the above procedure runs fine locally so it's likely some kind of timeout limit that affects procedures. For instance, you can run the exact logic from inside the procedure without issue, so as a stopgap, until I get this figured out, you can simply run the logic of the procedure by itself to avoid the timeout:

CREATE TEMPORARY TABLE IF NOT EXISTS temp_result_table ( uuid CHAR(36), text_uuid CHAR(36), parent_uuid CHAR(36), type VARCHAR(250), objectOnTabletORobjInText INT, charOnTabletORwordOnTablet VARCHAR(250), charOnLineORchildNum INT, previous VARCHAR(250), affected_column VARCHAR(250) ); CALL find_obj_in_text_iteration_issues(); CALL find_word_on_tablet_iteration_issues(); CALL find_child_num_iteration_issues(); CALL find_object_on_tablet_iteration_issues(); CALL find_char_on_tablet_iteration_issues(); CALL find_char_on_line_iteration_issues(); SELECT * FROM temp_result_table; DROP TABLE temp_result_table;

edstratford commented 1 year ago

Gertrudius:

These two procedures look for discrepancies between uuid.uuid and the various table.uuids:

CALL find_uuids_not_in_uuid_table(); CALL find_uuids_in_uuid_and_not_in_other_tables();

edstratford commented 1 year ago

Gertrudius: DROP PROCEDURE find_uuids_not_in_uuid_table; DELIMITER // CREATE PROCEDURE find_uuids_not_in_uuid_table() BEGIN

EXCLUDES the following tables which lack a uuid column or whose uuids should not appear in uuid.uuid

cache_status

knex_migrations

knex_migrations_lock

logging (will have deleted uuid's)

oare_group

page_content

permissions

quarantine_text

search_failure

temp_prep_split

temp_theo

user_group

word_sort

IF ((SELECT uuid FROM alias WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "alias") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "alias" FROM alias WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "alias"); END IF; IF ((SELECT uuid FROM archive WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "archive") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "archive" FROM archive WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "archive"); END IF; IF ((SELECT uuid FROM asset WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "asset") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "asset" FROM asset WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "asset"); END IF; IF ((SELECT uuid FROM bibliography WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "bibliography") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "bibliography" FROM bibliography WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "bibliography"); END IF; IF ((SELECT uuid FROM collection WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "collection") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "collection" FROM collection WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "collection"); END IF; IF ((SELECT uuid FROM comments WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "comments") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "comments" FROM comments WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "comments"); END IF; IF ((SELECT uuid FROM concept WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "concept") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "concept" FROM concept WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "concept"); END IF; IF ((SELECT uuid FROM dictionary_form WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_form") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "dictionary_form" FROM dictionary_form WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_form"); END IF; IF ((SELECT uuid FROM dictionary_spelling WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_spelling") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "dictionary_spelling" FROM dictionary_spelling WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_spelling"); END IF; IF ((SELECT uuid FROM dictionary_spelling_epigraphy WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_spelling_epigraphy") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "dictionary_spelling_epigraphy" FROM dictionary_spelling_epigraphy WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_spelling_epigraphy"); END IF; IF ((SELECT uuid FROM dictionary_word WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_word") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "dictionary_word" FROM dictionary_word WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "dictionary_word"); END IF; IF ((SELECT uuid FROM editor_credit WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "editor_credit") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "editor_credit" FROM editor_credit WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "editor_credit"); END IF; IF ((SELECT uuid FROM errors WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "errors") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "errors" FROM errors WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "errors"); END IF; IF ((SELECT uuid FROM event WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "event") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "event" FROM event WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "event"); END IF; IF ((SELECT uuid FROM field WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "field") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "field" FROM field WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "field"); END IF; IF ((SELECT uuid FROM group_allowlist WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "group_allowlist") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "group_allowlist" FROM group_allowlist WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "group_allowlist"); END IF; IF ((SELECT uuid FROM group_edit_permissions WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "group_edit_permissions") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "group_edit_permissions" FROM group_edit_permissions WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "group_edit_permissions"); END IF; IF ((SELECT uuid FROM hierarchy WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "hierarchy") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "hierarchy" FROM hierarchy WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "hierarchy"); END IF; IF ((SELECT uuid FROM item_properties WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "item_properties") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "item_properties" FROM item_properties WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "item_properties"); END IF; IF ((SELECT uuid FROM link WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "link") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "link" FROM link WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "link"); END IF; IF ((SELECT uuid FROM note WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "note") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "note" FROM note WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "note"); END IF; IF ((SELECT uuid FROM period WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "period") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "period" FROM period WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "period"); END IF; IF ((SELECT uuid FROM person WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "person") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "person" FROM person WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "person"); END IF; IF ((SELECT uuid FROM predefinition WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "predefinition") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "predefinition" FROM predefinition WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "predefinition"); END IF; IF ((SELECT uuid FROM public_denylist WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "public_denylist") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "public_denylist" FROM public_denylist WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "public_denylist"); END IF; IF ((SELECT uuid FROM reading_sibling WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "reading_sibling") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "reading_sibling" FROM reading_sibling WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "reading_sibling"); END IF; IF ((SELECT uuid FROM reset_password_links WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "reset_password_links") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "reset_password_links" FROM reset_password_links WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "reset_password_links"); END IF; IF ((SELECT uuid FROM resource WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "resource") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "resource" FROM resource WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "resource"); END IF; IF ((SELECT uuid FROM sign WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "sign" FROM sign WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign"); END IF; IF ((SELECT uuid FROM sign_org WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign_org") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "sign_org" FROM sign_org WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign_org"); END IF; IF ((SELECT uuid FROM sign_reading WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign_reading") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "sign_reading" FROM sign_reading WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign_reading"); END IF; IF ((SELECT uuid FROM sign_sibling WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign_sibling") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "sign_sibling" FROM sign_sibling WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "sign_sibling"); END IF; IF ((SELECT uuid FROM spatial_unit WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "spatial_unit") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "spatial_unit" FROM spatial_unit WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "spatial_unit"); END IF; IF ((SELECT uuid FROM text WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "text" FROM text WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text"); END IF; IF ((SELECT uuid FROM text_discourse WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_discourse") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "text_discourse" FROM text_discourse WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_discourse"); END IF; IF ((SELECT uuid FROM text_drafts WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_drafts") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "text_drafts" FROM text_drafts WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_drafts"); END IF; IF ((SELECT uuid FROM text_epigraphy WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_epigraphy") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "text_epigraphy" FROM text_epigraphy WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_epigraphy"); END IF; IF ((SELECT uuid FROM text_markup WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_markup") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "text_markup" FROM text_markup WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "text_markup"); END IF; IF ((SELECT uuid FROM threads WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "threads") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "threads" FROM threads WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "threads"); END IF; IF ((SELECT uuid FROM tree WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "tree") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "tree" FROM tree WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "tree"); END IF; IF ((SELECT uuid FROM unique_spellings WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "unique_spellings") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "unique_spellings" FROM unique_spellings WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "unique_spellings"); END IF; IF ((SELECT uuid FROM user WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "user") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "user" FROM user WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "user"); END IF; IF ((SELECT uuid FROM value WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "value") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "value" FROM value WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "value"); END IF; IF ((SELECT uuid FROM variable WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "variable") LIMIT 1) IS NOT NULL) THEN SELECT uuid, "variable" FROM variable WHERE uuid NOT IN (SELECT uuid FROM uuid WHERE table_reference = "variable"); END IF; END // DELIMITER ;

DROP PROCEDURE find_uuids_in_uuid_and_not_in_other_tables; DELIMITER // CREATE PROCEDURE find_uuids_in_uuid_and_not_in_other_tables() BEGIN

EXCLUDES the following tables which lack a uuid column or whose uuids should not appear in uuid.uuid

cache_status

knex_migrations

knex_migrations_lock

logging (will have deleted uuid's)

oare_group

page_content

permissions

quarantine_text

search_failure

temp_prep_split

temp_theo

user_group

word_sort

IF ((SELECT uuid FROM uuid WHERE table_reference = "alias" AND uuid NOT IN (SELECT uuid FROM alias) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "alias" AND uuid NOT IN (SELECT uuid FROM alias); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "archive" AND uuid NOT IN (SELECT uuid FROM archive) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "archive" AND uuid NOT IN (SELECT uuid FROM archive); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "asset" AND uuid NOT IN (SELECT uuid FROM asset) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "asset" AND uuid NOT IN (SELECT uuid FROM asset); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "bibliography" AND uuid NOT IN (SELECT uuid FROM bibliography) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "bibliography" AND uuid NOT IN (SELECT uuid FROM bibliography); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "collection" AND uuid NOT IN (SELECT uuid FROM collection) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "collection" AND uuid NOT IN (SELECT uuid FROM collection); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "comments" AND uuid NOT IN (SELECT uuid FROM comments) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "comments" AND uuid NOT IN (SELECT uuid FROM comments); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "concept" AND uuid NOT IN (SELECT uuid FROM concept) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "concept" AND uuid NOT IN (SELECT uuid FROM concept); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "dictionary_form" AND uuid NOT IN (SELECT uuid FROM dictionary_form) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "dictionary_form" AND uuid NOT IN (SELECT uuid FROM dictionary_form); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "dictionary_spelling" AND uuid NOT IN (SELECT uuid FROM dictionary_spelling) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "dictionary_spelling" AND uuid NOT IN (SELECT uuid FROM dictionary_spelling); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "dictionary_spelling_epigraphy" AND uuid NOT IN (SELECT uuid FROM dictionary_spelling_epigraphy) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "dictionary_spelling_epigraphy" AND uuid NOT IN (SELECT uuid FROM dictionary_spelling_epigraphy); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "dictionary_word" AND uuid NOT IN (SELECT uuid FROM dictionary_word) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "dictionary_word" AND uuid NOT IN (SELECT uuid FROM dictionary_word); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "editor_credit" AND uuid NOT IN (SELECT uuid FROM editor_credit) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "editor_credit" AND uuid NOT IN (SELECT uuid FROM editor_credit); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "errors" AND uuid NOT IN (SELECT uuid FROM errors) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "errors" AND uuid NOT IN (SELECT uuid FROM errors); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "event" AND uuid NOT IN (SELECT uuid FROM event) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "event" AND uuid NOT IN (SELECT uuid FROM event); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "field" AND uuid NOT IN (SELECT uuid FROM field) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "field" AND uuid NOT IN (SELECT uuid FROM field); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "group_allowlist" AND uuid NOT IN (SELECT uuid FROM group_allowlist) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "group_allowlist" AND uuid NOT IN (SELECT uuid FROM group_allowlist); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "group_edit_permissions" AND uuid NOT IN (SELECT uuid FROM group_edit_permissions) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "group_edit_permissions" AND uuid NOT IN (SELECT uuid FROM group_edit_permissions); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "hierarchy" AND uuid NOT IN (SELECT uuid FROM hierarchy) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "hierarchy" AND uuid NOT IN (SELECT uuid FROM hierarchy); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "item_properties" AND uuid NOT IN (SELECT uuid FROM item_properties) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "item_properties" AND uuid NOT IN (SELECT uuid FROM item_properties); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "link" AND uuid NOT IN (SELECT uuid FROM link) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "link" AND uuid NOT IN (SELECT uuid FROM link); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "note" AND uuid NOT IN (SELECT uuid FROM note) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "note" AND uuid NOT IN (SELECT uuid FROM note); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "period" AND uuid NOT IN (SELECT uuid FROM period) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "period" AND uuid NOT IN (SELECT uuid FROM period); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "person" AND uuid NOT IN (SELECT uuid FROM person) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "person" AND uuid NOT IN (SELECT uuid FROM person); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "predefinition" AND uuid NOT IN (SELECT uuid FROM predefinition) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "predefinition" AND uuid NOT IN (SELECT uuid FROM predefinition); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "public_denylist" AND uuid NOT IN (SELECT uuid FROM public_denylist) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "public_denylist" AND uuid NOT IN (SELECT uuid FROM public_denylist); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "reading_sibling" AND uuid NOT IN (SELECT uuid FROM reading_sibling) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "reading_sibling" AND uuid NOT IN (SELECT uuid FROM reading_sibling); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "reset_password_links" AND uuid NOT IN (SELECT uuid FROM reset_password_links) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "reset_password_links" AND uuid NOT IN (SELECT uuid FROM reset_password_links); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "resource" AND uuid NOT IN (SELECT uuid FROM resource) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "resource" AND uuid NOT IN (SELECT uuid FROM resource); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "sign" AND uuid NOT IN (SELECT uuid FROM sign) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "sign" AND uuid NOT IN (SELECT uuid FROM sign); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "sign_org" AND uuid NOT IN (SELECT uuid FROM sign_org) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "sign_org" AND uuid NOT IN (SELECT uuid FROM sign_org); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "sign_reading" AND uuid NOT IN (SELECT uuid FROM sign_reading) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "sign_reading" AND uuid NOT IN (SELECT uuid FROM sign_reading); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "sign_sibling" AND uuid NOT IN (SELECT uuid FROM sign_sibling) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "sign_sibling" AND uuid NOT IN (SELECT uuid FROM sign_sibling); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "spatial_unit" AND uuid NOT IN (SELECT uuid FROM spatial_unit) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "spatial_unit" AND uuid NOT IN (SELECT uuid FROM spatial_unit); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "text" AND uuid NOT IN (SELECT uuid FROM text) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "text" AND uuid NOT IN (SELECT uuid FROM text); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "text_discourse" AND uuid NOT IN (SELECT uuid FROM text_discourse) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "text_discourse" AND uuid NOT IN (SELECT uuid FROM text_discourse); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "text_drafts" AND uuid NOT IN (SELECT uuid FROM text_drafts) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "text_drafts" AND uuid NOT IN (SELECT uuid FROM text_drafts); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "text_epigraphy" AND uuid NOT IN (SELECT uuid FROM text_epigraphy) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "text_epigraphy" AND uuid NOT IN (SELECT uuid FROM text_epigraphy); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "text_markup" AND uuid NOT IN (SELECT uuid FROM text_markup) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "text_markup" AND uuid NOT IN (SELECT uuid FROM text_markup); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "threads" AND uuid NOT IN (SELECT uuid FROM threads) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "threads" AND uuid NOT IN (SELECT uuid FROM threads); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "tree" AND uuid NOT IN (SELECT uuid FROM tree) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "tree" AND uuid NOT IN (SELECT uuid FROM tree); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "unique_spellings" AND uuid NOT IN (SELECT uuid FROM unique_spellings) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "unique_spellings" AND uuid NOT IN (SELECT uuid FROM unique_spellings); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "user" AND uuid NOT IN (SELECT uuid FROM user) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "user" AND uuid NOT IN (SELECT uuid FROM user); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "value" AND uuid NOT IN (SELECT uuid FROM value) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "value" AND uuid NOT IN (SELECT uuid FROM value); END IF; IF ((SELECT uuid FROM uuid WHERE table_reference = "variable" AND uuid NOT IN (SELECT uuid FROM variable) LIMIT 1) IS NOT NULL) THEN SELECT uuid, table_reference FROM uuid WHERE table_reference = "variable" AND uuid NOT IN (SELECT uuid FROM variable); END IF; END // DELIMITER ;

Gertrudius commented 8 months ago

It should be pretty easy to write a script to run through all of our data integrity queries and then return the queries that have hits for further exploration of the issue. I'm going to start building one, and I can continue to update it as we come up with more queries.

Gertrudius commented 8 months ago

Spots text_discourse types of sentence, paragraph, and phrase which are empty:

SELECT * FROM text_discourse 
WHERE uuid NOT IN (SELECT parent_uuid FROM text_discourse WHERE parent_uuid IS NOT NULL)
AND type IN ('sentence','paragraph','phrase');

This one does the same but only for those with properties:

SELECT * FROM text_discourse 
WHERE uuid NOT IN (SELECT parent_uuid FROM text_discourse WHERE parent_uuid IS NOT NULL)
AND `type` IN ("sentence","paragraph","phrase")
AND uuid IN (SELECT reference_uuid FROM item_properties);

EDIT: Here are some which check the integrity between text_epigraphy and text_markup:

SELECT * FROM text_epigraphy 
WHERE `type` = 'region'
AND uuid NOT IN (SELECT reference_uuid FROM text_markup);
SELECT * FROM text_epigraphy 
WHERE `type` = "undeterminedSigns" 
AND uuid NOT IN (SELECT reference_uuid FROM text_markup);
SELECT * FROM text_epigraphy 
WHERE `type` = "undeterminedLines" 
AND uuid NOT IN (SELECT reference_uuid FROM text_markup);

These check for internal inconsistencies in text_markup:

SELECT * FROM text_markup WHERE `type` = "undeterminedSigns" AND num_value IS NULL;
SELECT * FROM text_markup WHERE `type` = "undeterminedLines" AND num_value IS NULL;
oaregithub commented 8 months ago

At this moment, I don't want to programmatically fix these, only produce a report of some sort. For example, first search returns 54 results, with the first ten being same text. Prefer to address these manually at first, then figure out if appropriate programmatic response is available.

Gertrudius commented 8 months ago

The script I wrote to work through all these queries turns them up just fine, and I wasn't planning to write something programmatic to fix the outstanding childless text_discourse.sentence/paragraph/phrase. I was just looking to clarify whether you plan on trying to keep these handful of discourse units or get rid of them, so I could start fixing them manually.

Gertrudius commented 8 months ago

I went ahead and fixed the remaining childless discourse units manually by retaining them and placing them in the correct parent/child relationships.

edstratford commented 8 months ago

Sounds good. Was a good list of texts that may have to be checked more thoroughly because they were non standard. Thanks for doing that. I think I still have a list of them from SQL.(Sent from my iPhone)Edward StratfordDepartment of HistoryJFSB 2123Brigham Young UniversityProvo, UT @. information contained in this email is confidential and intended solely for the addressee.  Access to this email by anyone else is unauthorized.  If you are not the intended recipient, please reply to the sender that you've received this message in error and then delete it.  Any other disclosure, copying, distribution, or any action taken by anyone other than the intended recipient is prohibited and unlawful.On Mar 18, 2024, at 16:15, Gertrudius @.> wrote: I went ahead and just fixed the remaining childless discourse units manually.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were assigned.Message ID: @.***>

Gertrudius commented 8 months ago

looks for discourse units whose parent is not in the same text as it:

SELECT * FROM text_discourse AS td
RIGHT JOIN text_discourse AS p
ON td.parent_uuid = p.uuid AND td.text_uuid != p.text_uuid
WHERE td.uuid IS NOT NULL;

EDIT: finds text_epigraphy signs, numbers, and undetermined signs that do not have a discourse_uuid, and do not have a markup type of erasure of superfluous.

SELECT * FROM text_epigraphy AS te
WHERE discourse_uuid IS NULL
AND te.`type` IN ("number","sign","undeterminedSigns")
AND uuid NOT IN (SELECT reference_uuid FROM text_markup WHERE `type` IN ("superfluous","erasure"))
Gertrudius commented 8 months ago

looks for text_epigraphy types of region that do not have one of the needed corresponding text_markup types.

SELECT t.display_name, te.uuid AS te_uuid, tm.uuid AS tm_uuid, te.text_uuid, te.`type` AS te_type,tm.`type` AS tm_type, te.object_on_tablet FROM text_epigraphy AS te
LEFT JOIN text_markup AS tm
    ON te.uuid = tm.reference_uuid
LEFT JOIN `text` AS t
ON te.text_uuid = t.uuid
WHERE te.`type` = "region"
AND te.uuid NOT IN (SELECT reference_uuid FROM text_markup WHERE type IN ("broken","isSealImpression","ruling","uninscribed"))
GROUP BY tm.reference_uuid;