Open edstratford opened 1 year ago
Gertrudius late Dec 2022:
There are 11 undeterminedLines or broken that have an explicit_spelling and transcription value of (broken area). I'm assuming these should be brought into conformity with the (large break) and (# broken lines) paradigms.
Stratford: Correct. Please change as described.
It appears that we have a couple cases where a region in text_discourse will be used as a reference_uuid for multiple broken/undeterminedLines in text_epigraphy. This seems to occur when a broken ends a side, and a broken begins the next side. Now obviously it's not essential to have a discourse_unit for each if there is no intervening text, but is that an organizational paradigm we plan to continue to support in the future?
SELECT te.id, te.uuid, te.text_uuid, te.object_on_tablet, tm.*, td.id, td.uuid, td.type, td.obj_in_text, td.parent_uuid, td.explicit_spelling, td.transcription, COUNT(td.uuid) AS this_count FROM text_epigraphy te
INNER JOIN text_markup tm ON tm.reference_uuid = te.uuid AND tm.type IN ('undeterminedLines','broken')
LEFT JOIN text_discourse td ON te.discourse_uuid = td.uuid
GROUP BY td.uuid
ORDER by this_count DESC;
Text_discourse represents language (words, phrases, sentences, etc.). Text_epigraphy represents the physical markings on the tablet. In text_epigraphy, we use the region to designate things that don’t fit neatly into the category of line or inside of lines (seal impressions, rulings, large breaks of an unknown # of lines).
We also need region in text_discourse to represent the one thing that won’t fit into words, numbers, phrases, clauses, sentences, or paragraphs --- large breaks in the text, where the thread of the conversation or text gets lost.
Currently, there are about 3050 instances of breaks of 1 or more lines or breaks of an unknown number of lines (a region with text_markup.type 'broken). Most (2720) of these have a corresponding region in text_disocurse. These text_discourse regions have explicit_spelling and transcription content of ‘(large break)’ or ‘(# broken lines)’ .
We need to insert the remaining 330 or so of these with appropriate explicit_spelling and transcription content. The query below selects these at the top and the ones in good order below (for comparison).
these regions DO take a word_on_tablet increment (as long as that column is still in use).
Parent_uuid for all should be the discourseUnit -- for any break of more than 1 line, this will be the rule. In the future, 1 or 2 line breaks can be reviewed to see if they remain in a paragraph where the thread of the conversation is clearly on the same topic (or in debt notes, etc. where the structure of the text is obvious.
SELECT te.id, te.uuid, te.text_uuid, te.object_on_tablet, tm.*, td.id, td.uuid, td.type, td.obj_in_text, td.parent_uuid, td.explicit_spelling, td.transcription FROM text_epigraphy te INNER JOIN text_markup tm ON tm.reference_uuid = te.uuid AND tm.type IN ('undeterminedLines','broken') LEFT JOIN text_discourse td ON te.discourse_uuid = td.uuid ORDER by td.type, td.explicit_spelling, te.text_uuid;
Will require discourse_uuid on the text_epigraphy rows, and incrementing of the obj_in_text, word_on_tablet, child_num.
(FOR LATER: -> In cases where the region clearly straddles two known paragraphs (such as when two broken lines clearly have the transition between two predictable sections of a debt note -- perhaps in this case again, it should be the child of the discourse unit, and the two paragraph sections break off and resume on either side of it... MAKE DETERMINATION)