EPIC: Support changing Greek Original Lang - was: Investigate support for SR Greek version of UGNT

PhotoNomad0 commented 1 year ago

Story Explanation

User Story

Changes needed:

pre-existing issues discovered:
- MEDIUM - when user changes GLs, the Original Language (e.g. the UGNT) version is not immediately updated in wA tool.
- LARGE - when there are attribute changes in the Original Language, the new attribute values are not migrated into wA aligned data.
new issues supported:
- TINY - not able to handle x-strong attribute in \w tags in original language.
- SMALL - add support for short form of Strong's numbers.
- MEDIUM - Handle changes to morphology format. Options:
  1. ~map the new morph format to match the old format (mostly handled in usfm.js, but would still require updates to tc-ui-toolkit to add support for new keys).~ <= not practical
  2. add support for the new morph string format and keys (all handled in tc-ui-toolkit). This would fix both GWE and tC once we update tc-ui-toolkit.
    - auto detect the shorter 8 column format
    - there are new localization strings to add
    - add support for periods in empty fields (as well as commas)
    - create a new mapping table for the new format.

Questions

how can we make sure that this will also work with OT alignments?

Features / Specifications

[ ]
[ ]
[ ]

Definition of Done

[ ]
[ ]
[ ]

Additional Context

Mockups

PhotoNomad0 commented 1 year ago

Notes:

Testing new UGNT

from actions menu do “check for content updates”
scroll to bottom of the list and toggle on “Show Pre-Release Resources”
under “Koine Greek (el-x-koine)” select “UGNT - Greek_New_Testament] - Pre-Release”
and under “English (en)” select “ULT - Aligned_Bible - unfoldingWord” if it has not already been downloaded.
click Download and wait til finished.
I then opened a project from aligned ult “en_ult_tit_book”
got warning right away that alignments have been invalidated
then I scrolled down to the wA tool and changed the GL to “English (en) - unfoldingWord”. And the progress bar dropped to 36%.
opened wA and got another warning that alignments have been invalidated.
now to see the new UGNT in the scripture pane, I first closed the existing UGNT pane (which was still using Door43-Catalog). Then I clicked on the add button and scrolled down to option “Koine Greek (Original Language) … (unfoldingWord)” and clicked Load. Ignore the option “Koine Greek (Original Language) … (titus_unfoldingWord) …” (not sure what is going on there).
Now I could see the new UGNT text.

PhotoNomad0 commented 1 year ago

Notes:

Bruce McLean2:23 PM I did discover an inconsistency in the exported USFM. For the zaln tag we are adding the x- prefix to all the attributes including strong, but for the w tag we export it as strong without the x- prefix. It seems that that was by design because the zaln was a custom milestone we created and therefore had no standard attributes. So all the attributes have the x- prefix. This is confusing to people looking at USFM, but not a problem for tCore which expects the x- prefix in the zaln tag on import. To summarize:

for the w tag we use and expect strong attribute for the zaln tag we use and expect x-strong attribute @Birch @Larry Sallee @Benjamin Wright

2:34 PM There is a bug in tCore that persists old Original Language word Attributes. The Original language words (and their attributes) are read in for the book when you first open a project in wA. Unfortunately those do not get updated in the project alignment data until the verse gets invalidated. That is when the strong's numbers are cleared in that titus project. @Elsy L @Birch

Looks like we need to add another data migration step.

@Larry Sallee @Benjamin Wright It is not going to be an easy conversion of the Morphology. I found the descriptions for the SR 8 column format here: https://greekcntr.org/resources/NTGRG.pdf . The morphology for the BHP 9 column format was at https://greekcntr.org/downloads/project.pdf , but has now been taken down. I do have a copy of that table, but there are significant differences that will require rework to our morph parser for the scripture pane. It doesn't look like we should or would want to try and map the morph string to the old format or vice versa.

Alan Bunning9:10 AM @Birch Yes, that is why I was pointing out that some discussion took place somewhere where it was pointed out that they are not technically Strongs numbers then and thus should not use the strong tag. It apparently breaks someone's software that expects a 4-digit Strongs number. But I don't know any details about that. But that is why I used the x-strongs tag which would be technically correct and the safer way to go (and make Robert Hunt happy). The version I am making for uW is custom, so I can easily change the tag for you, but I am wondering if you wouldn't also rather want to use the x-strongs tag for the same reason.

9:21 AM @Larry Sallee @Bruce McLean The morphology codes are essentially the same that I have provided, except that I dropped the Type column and moved that information elsewhere in my database because it is NOT morphology but syntactical and semantical information. Because it is not morphology I moved it to a different field. If you wanted that column information, it would probably be best represented with a separate tag like x-category or something like that. So the question is what do you really want? I could try to recreate that old scheme with the old Type column, or leave the format the same with the Type column blank, or put the Type information in a separate tag. Or I suppose you could also just drop the Type column from your software since it really is not morphological information. What is your pleasure?

PhotoNomad0 commented 1 year ago

Summary:

After comparing the old and new morph formats. It looks like the best approach to handle the new format in tC and GWE would be to auto detect the shorter 8 column morph format when displaying the Morphology data. It would require adding 7 more localization strings, modifying the mapping table, add support for '.' as empty fields, and can be made transparent to the user. The only loss is the type column which @Larry Sallee suggests is not a big requirement for the user. Looks like a Medium effort.

PhotoNomad0 commented 1 year ago

Notes:

conversion of greek morph strings to human readable is handled in MorphUtils.getMorphLocalizationKeysGreek() in word-aligner which uses a tree (morphCodeLocalizationMapGrk) to interpret the fields
- current morph strings are in format: "morph": "Gr,N,,,,,GMS," (13 characters)
- new morph strings are like: "morph": "Gr,N,....NMS" (12 characters dropping the type)
- in morphCodeLocalizationMapGrk it looks we
- remove the type nested in role (2)
- and all the fields from mood are decremented by 1 (e.g. mood goes from 4 to 3)

unfoldingWord / translationCore