letuananh / bela

👸 BELA - A pathway for creating and analysing multi-lingual transcripts using BELA convention and ELAN software
MIT License
3 stars 2 forks source link

flag special tokens and ### as errors (e.g., `:si:###` and `:m:###`) #1

Open vicchuayh opened 2 years ago

vicchuayh commented 2 years ago

As of April 12th 2022, :si:### and :m:### are allowed in BELA but it should not be allowed

letuananh commented 2 years ago

Let's compile an exhaustive list of these combinations then cover them all in one shot. It should be trivial.

vicchuayh commented 1 year ago

Instead of a list, I wonder if it's more efficient to implement it as ### should be standalone, example:

def _check_special_hash_token(belan):
    for p in belan.persons:
         for u in p.utterances:
              for c in u.chunks:
                  for t in c.value.split():
                       if "#" in t:
                           if t != "###":
                               errors.append(f"{belan.path} : {t} in {c} contains other character with special ### standalone token!")