Separate phonological from orthographical rules

Implemented in v0.4.0:

Subpackages for each language
Separate lark rules for grammar from lookup tables for terminals in JSON files
Use %extend directives to modify common grammar for individual romanization schemes
Parser and translator in separate modules within each subpackage

Potential future enhancements:

Intermediate translator to convert each parse tree to a common abstract parse tree structure, to deal with the many-to-many conversion issue.

Ideal:

Input text --> scheme-specific parse tree --> common parse tree --> scheme-specific translator --> output

Could we automatically generate parser and translator code from the parser grammar rules (!!)?

Currently, input text is directly parsed to common parse tree with hacky preprocessing steps. Translation rules are implicit in the translator code, but would be helpful to have some abstract representation that is more quickly comprehensible and be more easily configured by users.

learn-teochew / parsetc

Separate phonological from orthographical rules #6