learn-teochew / parsetc

Parse and convert between different Teochew romanization systems
https://learn-teochew.github.io/parsetc/
MIT License
0 stars 1 forks source link

Separate phonological from orthographical rules #6

Open kbseah opened 7 months ago

kbseah commented 7 months ago

Three layers:

  1. Basic syllable structure (initial medial coda) for Sinitic languages
  2. Sets of actual initials/medials/codas for a given language, dialect, or accent
  3. Orthography - how the terminals are actually written in a given transcription system

By separating the logic, we can do the following:

kbseah commented 7 months ago

Implemented in v0.4.0:

Potential future enhancements:

Intermediate translator to convert each parse tree to a common abstract parse tree structure, to deal with the many-to-many conversion issue.

Ideal:

Input text --> scheme-specific parse tree --> common parse tree --> scheme-specific translator --> output

Could we automatically generate parser and translator code from the parser grammar rules (!!)?

Currently, input text is directly parsed to common parse tree with hacky preprocessing steps. Translation rules are implicit in the translator code, but would be helpful to have some abstract representation that is more quickly comprehensible and be more easily configured by users.