Open pishoyg opened 4 months ago
A quick experiment with removing English post-processing from Coptic-within-English showed that the fix won't be entirely straightforward! Replacements of {}
with <b></b>
were necessary.
Perhaps the solution is to apply the list partially.
ENGLISH_POSTPROCESSING = [
("^+", "✠"),
("{", "<b>"),
("}", "</b>"),
("(", "<i>"),
(")", "</i>"),
(" | ", "\n"),
(" |", "\n"),
("/*", "("),
("*/", ")"),
("/$gk:", "["),
("$/", "]"),
("$", "―"),
]
Pinpoint exact examples of why the current pipeline errs, so we will see how to implement a fix to handle those cases in specific.
The current blast radius is unclear, so removing the bug
label and leaving only rigor
, until more problematic examples come up.
English post-processing likely shouldn't apply to Coptic-within-English. Neither should Coptic-within-English be treated as words with spellings.