unfoldingWord / translationCore

Repository for the desktop application translationCore
https://www.translationcore.com
Other
36 stars 11 forks source link

EPIC: Support changing Greek Original Lang - was: Investigate support for SR Greek version of UGNT #7458

Open PhotoNomad0 opened 1 year ago

PhotoNomad0 commented 1 year ago

Story Explanation

User Story

Changes needed:

Questions

Features / Specifications

Definition of Done

Additional Context

Mockups

PhotoNomad0 commented 1 year ago

Notes:

Testing new UGNT

PhotoNomad0 commented 1 year ago

Notes:

Bruce McLean2:23 PM I did discover an inconsistency in the exported USFM. For the zaln tag we are adding the x- prefix to all the attributes including strong, but for the w tag we export it as strong without the x- prefix. It seems that that was by design because the zaln was a custom milestone we created and therefore had no standard attributes. So all the attributes have the x- prefix. This is confusing to people looking at USFM, but not a problem for tCore which expects the x- prefix in the zaln tag on import. To summarize:

for the w tag we use and expect strong attribute for the zaln tag we use and expect x-strong attribute @Birch @Larry Sallee @Benjamin Wright

2:34 PM There is a bug in tCore that persists old Original Language word Attributes. The Original language words (and their attributes) are read in for the book when you first open a project in wA. Unfortunately those do not get updated in the project alignment data until the verse gets invalidated. That is when the strong's numbers are cleared in that titus project. @Elsy L @Birch

Looks like we need to add another data migration step.

@Larry Sallee @Benjamin Wright It is not going to be an easy conversion of the Morphology. I found the descriptions for the SR 8 column format here: https://greekcntr.org/resources/NTGRG.pdf . The morphology for the BHP 9 column format was at https://greekcntr.org/downloads/project.pdf , but has now been taken down. I do have a copy of that table, but there are significant differences that will require rework to our morph parser for the scripture pane. It doesn't look like we should or would want to try and map the morph string to the old format or vice versa.

Alan Bunning9:10 AM @Birch Yes, that is why I was pointing out that some discussion took place somewhere where it was pointed out that they are not technically Strongs numbers then and thus should not use the strong tag. It apparently breaks someone's software that expects a 4-digit Strongs number. But I don't know any details about that. But that is why I used the x-strongs tag which would be technically correct and the safer way to go (and make Robert Hunt happy). The version I am making for uW is custom, so I can easily change the tag for you, but I am wondering if you wouldn't also rather want to use the x-strongs tag for the same reason.

9:21 AM @Larry Sallee @Bruce McLean The morphology codes are essentially the same that I have provided, except that I dropped the Type column and moved that information elsewhere in my database because it is NOT morphology but syntactical and semantical information. Because it is not morphology I moved it to a different field. If you wanted that column information, it would probably be best represented with a separate tag like x-category or something like that. So the question is what do you really want? I could try to recreate that old scheme with the old Type column, or leave the format the same with the Type column blank, or put the Type information in a separate tag. Or I suppose you could also just drop the Type column from your software since it really is not morphological information. What is your pleasure?

PhotoNomad0 commented 1 year ago

Summary:

After comparing the old and new morph formats. It looks like the best approach to handle the new format in tC and GWE would be to auto detect the shorter 8 column morph format when displaying the Morphology data. It would require adding 7 more localization strings, modifying the mapping table, add support for '.' as empty fields, and can be made transparent to the user. The only loss is the type column which @Larry Sallee suggests is not a big requirement for the user. Looks like a Medium effort.

PhotoNomad0 commented 1 year ago

Notes:

  1. conversion of greek morph strings to human readable is handled in MorphUtils.getMorphLocalizationKeysGreek() in word-aligner which uses a tree (morphCodeLocalizationMapGrk) to interpret the fields
    • current morph strings are in format: "morph": "Gr,N,,,,,GMS," (13 characters)
    • new morph strings are like: "morph": "Gr,N,....NMS" (12 characters dropping the type)
    • in morphCodeLocalizationMapGrk it looks we
    • remove the type nested in role (2)
    • and all the fields from mood are decremented by 1 (e.g. mood goes from 4 to 3)