CaptionSpeaker は youtube で表示される字幕を発話させる事で耳から聞くことができるようにする chrome extension です。
I wish there was an option to skip reading punctuation marks. #11

Closed 1353604736 closed 1 year ago

1353604736 commented 1 year ago

I hope there can be an option to skip punctuation marks when reading aloud, so that the reading speed can keep up with the subtitle display speed as much as possible when the video is played at 2x speed.


limura commented 1 year ago

Well, that is an interesting opinion.

As a test, I added the following to the first line of AddSpeechQueue() and tried it in a Japanese environment.

text = text.replace(/[、。]] /g, " ");

I don't think this test gave me the results I was hoping for.

The results may vary depending on the language and the text-to-speech engine used, so we cannot conclude that this result is meaningless, but it did not seem to be very meaningful, at least as far as we were able to test it at hand.

Also, in Japanese, punctuation is "、。" but in English it is ",." is a punctuation mark. I wonder if ";" and ":" are also relevant in English. I am not familiar with English at all, so I don't know. I am not familiar with English, so I don't know. So, it seems to me that we need to think of a way to allow the user to specify what constitutes punctuation.

In this case, it would be better to consider how to present the setting to the user. In the above example, "、" and "。" to eliminate /[、。]] /g" to eliminate the "、" and "。", but it is not a good idea to show this regular expression to the user as it is. I think it would be more "understandable" to show it in the form of an ON/OFF toggle button with a sentence like "Ignore punctuation marks and read the text out loud". The drop-down point is part of the content of the regular expression, specifically "、。" In that case, what kind of explanatory text would make it easier for the user to understand the operation? If possible, I would like to avoid sentences that could be misunderstood.

In summary, I have the following concerns What do you think about these?

  1. removing punctuation does not shorten the time of speech much (2) We need to specify punctuation in various languages, but what kind of UserInterface should be the option item to specify the punctuation?


1353604736 commented 1 year ago

I don’t understand the logic of the code very well, but I wonder if you can add a switch option to control whether to ignore the punctuation pauses when reading the subtitles aloud.

But I guess the code seems to use the FetchCaptionData() function to store the fetched subtitle data in json.

Use regular expressions to replace the punctuation marks in the original subtitle content, and overwrite the original json, so that tts reads the replaced json subtitle file or content. This way it can be universal for punctuation pause problems in various languages.text = text.replace(/\p{Punctuation}/gu, " ");

Some languages’ TTS do have pauses when reading subtitles with many punctuation marks, such as Chinese TTS. For subtitles with multiple punctuation marks, this option still has the effect of speeding up the reading.

limura commented 1 year ago

I see what you mean about p{Punctuation} possibly being useful. However, p{Punctuation} has a fairly wide range of applicability, so it would be quite wild to uniformly rewrite those characters as whitespace characters. I am concerned because I simply don't know about p{Punctuation}. It is a fear of not knowing that I don't know what happens when a character that falls under p{Punctuation} is blanked out in an unknown language. However, it would be possible to define an option in the form of a "switch option", i.e., an ON/OFF switch, if you want to use the p{Punctuation} format. In that sense, it is promising. Therefore, I would like some evidence that changing p{Punctuation} to blank would be ok. Can you indicate?

Also, I wasn't sure how fast it would be in Chinese. Is it possible to indicate this with a numerical value? As I mentioned before, I don't think, at least I don't think, that implementing this feature would have much effect. If you are not convinced of this issue, you will not move from me.

Secondly, and this is not related to the main topic, but about the subtitles being in JSON. The subtitle data we use in CaptionSpeaker is JSON as the original data, so we take that and convert it to an internal format and use it. In that sense, the internal format is in the form of a JavaScript dictionary or array; you might call it JSON, but it would be more correct to understand it as simply the internal format of JavaScript.

limura commented 1 year ago

なお、句読点を端折る事で読み上げの時間を短くするよりは、意味を要約するなどして発話する単語自体を減らす方が有効な気もしています。 この場合は例えば大規模言語モデル等を使って要約をさせると良さそうですが、Youtube動画を開くたびに大規模言語モデルに要約をさせていると時間がかかるであるとか、課金周りをどうするかといった問題が解決できそうにありませんね。そのような意味ではあまり有望ではないアイディアになりそうです。

limura commented 1 year ago
