roedoejet / g2p

Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
https://g2p-studio.herokuapp.com
Other
128 stars 27 forks source link

g2p-studio can't handle more than one word in English #364

Open joanise opened 5 months ago

joanise commented 5 months ago

In the CLI, running g2p on multiple words in English requires tokenization, which is now enabled by default:

$ g2p convert "astonishing, my friend" eng eng-arpabet
AH S T AA N IH SH IH NG , M AY  F R EH N D
$ g2p convert --no-tok "astonishing, my friend" eng eng-arpabet

$ g2p convert --no-tok "astonishing" eng eng-arpabet
AH S T AA N IH SH IH NG

In g2p studio, it looks like we are not enabling tokenization: image but image

To align with our default behaviour on the CLI, I believe it would be best if the g2p studio also tokenized by default.

And wow, AH S T AA N... sounds so B AA S T AH N... Who wrote that pronunciation dict anyway??? ;)