w3c / ruby-t2s-req

Text to Speech of Electronic Documents Containing Ruby: User Requirements
https://w3c.github.io/ruby-t2s-req/
Other
0 stars 4 forks source link

Side effect of using ruby rather than base characters: は and へ #15

Closed murata2makoto closed 2 months ago

murata2makoto commented 2 years ago

In the modern Japanese language, there is basically only one way to read each kana character. But は and へ are exceptions. は is usually read aloud as /ha/ but is read aloud as /wa/ when this character is used as a particle. Meanwhile, へ is usually read aloud as /he/ but is read aloud as /e/ when it is used as a particle. Thus, to read these characters correctly, correct morphological analysis is a must.

A side effect of using ruby annotations rather than ruby bases for TTS is that morphological analysis typically fails. In particular, particle は and へ in ruby annotations are sometimes mistakenly pronounced as /ha/ and /he/. This is demonstrated by the 淀藩背信はいしん example used in the last TPAC. Such mistakes confuse users a lot.

Even when we use ruby rather than base characters for T2S, it might be possible to avoid such mistakes by inserting the space character or some other silent Unicode character between the immediately preceding は or へ and the ruby text.

Should we describe this problem in this note?

murata2makoto commented 11 months ago

I think that this example demonstrates a real problem of using ruby annotations for TTS. It deserves to be mentioned in this note.

murata2makoto commented 11 months ago

Here are other examples of ruby annotation containing は. If these ruby annotations (rather than ruby bases) are sent to TTS engines, は is mistakenly pronounced.

murata2makoto commented 11 months ago

Here are examples of へ occurring in ruby annotations. Sending them (rather than ruby bases) to TTS engines is very likely to cause troubles.

macnmm commented 11 months ago

Ruby in my opinion is ill-suited for TTS and should only be considered a reading aid. Normally only added to the first instance of the word in the text, and not often laid out strictly 1:1 with the annotated base text in many cases. Modern LLMs will do a better job anyway.