Sequential labeling 과 Segmentation & labeling 의 차이 검증

lovit / lattice_based_tagger

Lattice based Korean Morphological analyzer & Part of Speech Tagger

0 stars 0 forks source link

sequential labeling parts 를 collins_average_perceptron 으로 옮기는 중이며, 이 repository 의 학습용 데이터를 만드는 코드.

train_data.num_sents = -1
with open('../../../git/collins_average_perceptron/data/word_sequence.txt', 'w', encoding='utf-8') as f:
    for words_text, morphs_text in train_data:
        words = text_to_words(words_text, morphs_text)
        wordtags = [(word.word, word.tag0) for word in words]
        _, tags = zip(*wordtags)
        tags = ' '.join(tags)
        f.write('BOS  {}  EOS\t{}\n'.format(words_text.strip(), tags))

lovit / lattice_based_tagger

Sequential labeling 과 Segmentation & labeling 의 차이 검증 #7