In the CLI, running g2p on multiple words in English requires tokenization, which is now enabled by default:
$ g2p convert "astonishing, my friend" eng eng-arpabet
AH S T AA N IH SH IH NG , M AY F R EH N D
$ g2p convert --no-tok "astonishing, my friend" eng eng-arpabet
$ g2p convert --no-tok "astonishing" eng eng-arpabet
AH S T AA N IH SH IH NG
In g2p studio, it looks like we are not enabling tokenization:
but
To align with our default behaviour on the CLI, I believe it would be best if the g2p studio also tokenized by default.
And wow, AH S T AA N... sounds so B AA S T AH N... Who wrote that pronunciation dict anyway??? ;)
In the CLI, running g2p on multiple words in English requires tokenization, which is now enabled by default:
In g2p studio, it looks like we are not enabling tokenization: but
To align with our default behaviour on the CLI, I believe it would be best if the g2p studio also tokenized by default.
And wow, AH S T AA N... sounds so B AA S T AH N... Who wrote that pronunciation dict anyway??? ;)