oaregithub / oare_mono

1 stars 0 forks source link

Review utility of char_on_line #1559

Closed edstratford closed 1 year ago

edstratford commented 2 years ago

The tablet_renderer does not use char_on_line at all. Are there SP's or triggers that need this column? If not, I propose we consider removing it. Char_on_tablet is still useful.

edstratford commented 2 years ago

Jon will check to see if there is any 'unique' data still in char_on_line. IF small amount, or not at all, then save what is unique to google sheets, then maybe drop column in a migration.

Gertrudius commented 1 year ago

Here are the exceptions between char_on_line and object_on_tablet. All of these instances seem to be a result of a missing epigraphicUnit as a parent for the whole text.

uuid | text_uuid

72a303b1-de8b-ae0f-deb5-06e9c734c511 | 1fb69054-c2ef-11eb-bc1f-024de1c1cc1d b3e0b831-d1d0-4ab0-bc46-c1b60d5f24a8 | 6eacc23c-13ad-4b4f-98d3-0609570013e3 4a9bcc06-1d55-c6bc-bd12-2af72248ec41 | bba6626f-c2f9-11eb-bc1f-024de1c1cc1d c8f995dc-8b18-733e-970e-99a9b8e79a74 | e01a78f5-c23e-11eb-bc1f-024de1c1cc1d 4ec74333-5f69-47d7-a72e-afeb7c550d8d | e70c2b60-c2f6-11eb-bc1f-024de1c1cc1d

EDIT: I've added epigraphicUnits to each of these texts, and they are all resolved. My script is not showing any hits now.

edstratford commented 1 year ago

The following query suggests that there A LOT of instances where char_on_line is out of sync with object_on_tablet. This query gives all rows: SELECT t.name, t.uuid, te.id, te.type, te.line AS line, te.object_on_tablet, te.char_on_line, te1.object_on_tablet AS line_objnum, te.object_on_tablet-te.char_on_line AS objnum_minus_charlinenum FROm text_epigraphy te1 INNER JOIN text_epigraphy te ON te1.text_uuid = te.text_uuid AND te1.line = te.line AND te1.object_on_tablet != te.object_on_tablet-te.char_on_line AND te.type != 'line' INNER JOIN text t ON t.uuid = te.text_uuid WHERE te1.type = 'line';

An initial overview of the data suggests that in most of these cases, an edit I have made on the data has left the char_on_line data compromised as opposed to the object_on_tablet... this further suggests that a single ordering row is better than multiple -- easier to maintain one rather than three...

Gertrudius commented 1 year ago

I ended up replicating the query logic within a script because it was taking 7 plus minutes to run for me, but here is a list of those incorrect char_on_lines where the first element in the sequence does not have a reading of KÙ: https://docs.google.com/spreadsheets/d/1uXq9jBjygQzcYKxQs_bKUEqf7qZgYeJKZxqpb8BC1AE/edit?usp=sharing

I've checked a smattering of these, and thus far all have been errors in char_on_line rather than object_on_tablet, which probably means that keeping char_on_line would not preserve anything truly useful.

edstratford commented 1 year ago

Sounds good. char_on_line (and char_on_tablet) will both be eliminated from text_epigraphy in the upcoming migration.