haven-jeon / PyKoSpacing

Automatic Korean word spacing with Python
GNU General Public License v3.0
397 stars 118 forks source link

Process a list of words that should not be spaced #21

Closed haven-jeon closed 3 years ago

haven-jeon commented 3 years ago

Motivation

Since the spacing engine is not 100% accurate, spacing errors occur in some words. The most common errors occur in proper nouns, and some common nouns also have errors depending on the context.

19 #4

from pykospacing import spacing
# '구레나 룻' must be '구레나룻'
spacing('귀밑에서턱까지잇따라난수염을구레나룻이라고한다.')
'귀 밑에서 턱까지 잇따라 난 수염을 구레나 룻이라고 한다.'
# '쏠 편한' must be '쏠편한'
spacing('s20통장을쏠편한입출금으로 전환')
's20 통장을 쏠 편한 입출금으로 전환'

Suggestions

from pykospacing import Spacing

spacing = Spacing(rules=['구레나룻', '쏠편한'])
spacing('귀밑에서턱까지잇따라난수염을구레나룻이라고한다. s20 통장을 쏠 편한 입출금으로 전환')
'귀 밑에서 턱까지 잇따라 난 수염을 구레나룻이라고 한다. s20 통장을 쏠편한 입출금으로 전환'
haven-jeon commented 3 years ago

99335e9f963d8176ca2de803079bad695ddd7a13