lexibank / hsiuhmongmien

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

check the segmentation with sinopy #8

Closed LinguList closed 4 years ago

LinguList commented 4 years ago

when loading the data in cldf with lingpy, you should check for each lexeme if the sinopy-structure makes sense:

from sinopy.segments import get_structure
from lingpy.basictypes import lists

sequence = lists('w a n ⁵ + b a ³ + d k w au ² k n')

for strucs, morph in get_structure(sequence, zipped=True):
    struc = [x[0] for x in strucs]
    check = [x[1] for x in strucs]
    if len(struc) != len(morph):
        print('# problem with {0}'.format(morph))
        print('\t'.join(struc)+'\n'+'\t'.join(check))
    elif not 't' in struc:
        print('# no tone in {0}'.format(morph))
        print('\t'.join(struc)+'\n'+'\t'.join(check))

output:

# problem with ['d', 'k', 'w', 'au', '²']
i   m   n   t
d   kw  au  ²
# no tone in ['k', 'n']
i   m
k   n
Wu-Urbanek commented 4 years ago

I am modifying the orthography profile by checking with sinopy :)

Wu-Urbanek commented 4 years ago

updated the structure.md. Looks good so far.