timo-liu / eng-syl

A Seq2Seq Model that syllabifies English words.
MIT License
6 stars 2 forks source link

Incorrect output for COBWEB and UPHILL #3

Closed paulvonhippel closed 9 months ago

paulvonhippel commented 1 year ago

Code:

from eng_syl.syllabify import Syllabel
cobweb = syllabler.syllabify("COBWEB")
print(cobweb)
uphill = syllabler.syllabify("UPHILL")
print(uphill)

Output is:

CO-BWEB
UPHILL

Output should be:

COB-WEB
UP-HILL
timo-liu commented 1 year ago

Hey man, thanks for letting me know. I'll update the package with a fresher model; the newer model correctly syllabifies "COBWEB."

"UPHILL" is an orthographically ambiguous word however. The new model will syllabify the word as "U-PHILL."

The new model evaluates syllable boundaries without the context of true phonetic pronunciation.

paulvonhippel commented 1 year ago

Thanks! Neither of these words is ambiguous because they're both compound. Definition: a word is compound if you can split it into 2 words. Can you use that as a heuristic in your model?

Best, Paul

On Thursday, June 1, 2023, ellipse-liu @.***> wrote:

Hey man, thanks for letting me know. I'll update the package with a fresher model; the newer model correctly syllabifies "COBWEB."

"UPHILL" is an orthographically ambiguous word however. The new model will syllabify the word as "U-PHILL."

— Reply to this email directly, view it on GitHub https://github.com/ellipse-liu/eng-syl/issues/3#issuecomment-1572652266, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIMFN4GIKU6DLBDPSECFIPDXJDUEJANCNFSM6AAAAAAYWW5KNM . You are receiving this because you authored the thread.Message ID: @.***>

--

Best wishes, Paul von Hippel Professor, Associate Dean for Research LBJ School of Public Affairs University of Texas, Austin PaulvonHippel.com http://paulvonhippel.com @PaulvonHippel https://twitter.com/paulvonhippel