oracc / pyoracc

Python tools for working with ORACC
GNU General Public License v3.0
12 stars 10 forks source link

Consolidate line label parsing #83

Open rillian opened 5 years ago

rillian commented 5 years ago

Working on #78, I noticed some inconsistencies in the way line labels (numbers) are recognized by the lexer. For example, the list of characters accepted as a primer marker for relative line numbers is different in different contexts, and none of them accept labels like 109a. in Q000040 from the sample corpus.

Probably the pattern for the line label should be declared once and concatenated into all the lexer patterns which need to recognize them.