Closed edstratford closed 1 year ago
There are 11 undeterminedLines
or broken
that have an explicit_spelling and transcription value of (broken area)
. I'm assuming these should be brought into conformity with the (large break)
and (# broken lines)
paradigms.
Correct. Please change as described.
There are a number of texts that have this issue that have no content at all in text_discourse, I'll leave those alone for the moment so we can address the issue of adding discourseUnit and any other relevant items to discourse separately.
SELECT te.uuid AS te_uuid, te.text_uuid, te.type
AS te_type, tm.type
AS tm_type, tm.num_value, td.uuid, td.type AS td_type, td.obj_in_text, td.word_on_tablet, td.child_num, td.parent_uuid AS td_parent_uuid, td.explicit_spelling, td.transcription FROM text_epigraphy AS te
LEFT JOIN text_discourse AS td
ON te.discourse_uuid = td.uuid
LEFT JOIN text_markup AS tm
ON te.uuid = tm.reference_uuid AND tm.type IN ("undeterminedLines","broken")
WHERE te.text_uuid NOT IN (SELECT text_uuid FROM text_discourse)
ORDER BY te.text_uuid, te.object_on_tablet;
While working on this I ran into another side issue. We have a number of duplicate object_on_tablet values in text_epigraphy, 29,266 to be exact, ranging from 2 to 5 duplicate values. I noticed this problem when I ran into columns that had been inserted without the correct object_on_tablet value, though that only makes up 4,455 of them.
SELECT *, COUNT(te.object_on_tablet) AS count FROM text_epigraphy AS te GROUP BY te.text_uuid, te.object_on_tablet HAVING COUNT(te.object_on_tablet) > 1;
SELECT *, COUNT(te.object_on_tablet) AS count FROM text_epigraphy AS te WHERE type = "column" GROUP BY te.text_uuid, te.object_on_tablet HAVING COUNT(te.object_on_tablet) > 1;
Rerun above query - this last problem might be solved.
This issue and sub issues have been moved to oare_sql.
Text_discourse represents language (words, phrases, sentences, etc.). Text_epigraphy represents the physical markings on the tablet. In text_epigraphy, we use the region to designate things that don’t fit neatly into the category of line or inside of lines (seal impressions, rulings, large breaks of an unknown # of lines).
We also need region in text_discourse to represent the one thing that won’t fit into words, numbers, phrases, clauses, sentences, or paragraphs --- large breaks in the text, where the thread of the conversation or text gets lost.
Currently, there are about 3050 instances of breaks of 1 or more lines or breaks of an unknown number of lines (a region with text_markup.type 'broken). Most (2720) of these have a corresponding region in text_disocurse. These text_discourse regions have explicit_spelling and transcription content of ‘(large break)’ or ‘(# broken lines)’ .
We need to insert the remaining 330 or so of these with appropriate explicit_spelling and transcription content. The query below selects these at the top and the ones in good order below (for comparison).
these regions DO take a word_on_tablet increment (as long as that column is still in use).
Parent_uuid for all should be the discourseUnit -- for any break of more than 1 line, this will be the rule. In the future, 1 or 2 line breaks can be reviewed to see if they remain in a paragraph where the thread of the conversation is clearly on the same topic (or in debt notes, etc. where the structure of the text is obvious.
SELECT te.id, te.uuid, te.text_uuid, te.object_on_tablet, tm.*, td.id, td.uuid, td.type, td.obj_in_text, td.parent_uuid, td.explicit_spelling, td.transcription FROM text_epigraphy te INNER JOIN text_markup tm ON tm.reference_uuid = te.uuid AND tm.type IN ('undeterminedLines','broken') LEFT JOIN text_discourse td ON te.discourse_uuid = td.uuid ORDER by td.type, td.explicit_spelling, te.text_uuid;
Will require discourse_uuid on the text_epigraphy rows, and incrementing of the obj_in_text, word_on_tablet, child_num.
(FOR LATER: -> In cases where the region clearly straddles two known paragraphs (such as when two broken lines clearly have the transition between two predictable sections of a debt note -- perhaps in this case again, it should be the child of the discourse unit, and the two paragraph sections break off and resume on either side of it... MAKE DETERMINATION)