kakaobrain / g2pm

A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Apache License 2.0
336 stars 73 forks source link

Training Data Explanation #2

Closed dpny518 closed 4 years ago

dpny518 commented 4 years ago

if you open up the .lb file there is only one pinyin there, while the corresponding line in .sent file has a string of characters..shouldn't the .lb file also have a string of pronunciation?

dpny518 commented 4 years ago

I read the paper so it seems in the sentence there is a bunch of characters and one of the characters gets surrounded by "_" and this is the character that has the pinyin for in the .lb file

dpny518 commented 4 years ago

I read the paper so it seems in the sentence there is a bunch of characters and one of the characters gets surrounded by "_" and this is the character that has the pinyin for in the .lb file