pishoyg / coptic

This is a project that aims to make the Coptic language more learnable.
https://remnqymi.com/
GNU General Public License v3.0
10 stars 0 forks source link

[Crum] Revisit the Coptic-within-English Parsing #63

Open pishoyg opened 4 months ago

pishoyg commented 4 months ago

English post-processing likely shouldn't apply to Coptic-within-English. Neither should Coptic-within-English be treated as words with spellings.

pishoyg commented 3 months ago

A quick experiment with removing English post-processing from Coptic-within-English showed that the fix won't be entirely straightforward! Replacements of {} with <b></b> were necessary. Perhaps the solution is to apply the list partially.

ENGLISH_POSTPROCESSING = [
    ("^+", "✠"),
    ("{", "<b>"),
    ("}", "</b>"),
    ("(", "<i>"),
    (")", "</i>"),
    (" | ", "\n"),
    (" |", "\n"),
    ("/*", "("),
    ("*/", ")"),
    ("/$gk:", "["),
    ("$/", "]"),
    ("$", "―"),
]

Pinpoint exact examples of why the current pipeline errs, so we will see how to implement a fix to handle those cases in specific.

pishoyg commented 3 months ago

The current blast radius is unclear, so removing the bug label and leaving only rigor, until more problematic examples come up.