Implemented speechSynthesis auto-stop on YouTube player pause (FIXED)

RonnieBlade commented 2 years ago

Hi there! I think it would be a nice feature to make speechSynthesis stop speaking immediately when the video gets paused. I've implemented that feature using MutationObserver, waiting for the "paused-mode" class to appear in the player element.

Also, I've added "wasPaused" flag to repeat the last captions after playback resumes if there are no new captions at current time.

I hope my humble contribution will help the project!

P.S. Sorry for reformating your code in commits on the last pull request, this time I made the changes super carefully.

limura commented 2 years ago

pull requestありがとうございます。動画が停止した時にCaptionSpeaker側の発話も停止するというアイデアは良いと思います。

ただ、少し動作を確認してみましたところ、動作上気になる点がいくつかありました。

具体的には、

CaptionSpeakerが発話中にYoutubeの動画を停止し、その後停止したまま再生位置を変更し、その後再生を再開させた時、以前発話していた部分の発話を再開してしまう。(再生開始位置が変わっているのでその部分での字幕を読み上げるべきなのに、以前の字幕を読み上げてしまう)
Youtube以外の id="movie_player" のタグが存在するWebページで不必要なMutationObserverが生成されてしまう可能性がある。
CaptionSpeaker側の発話を speechSynthesis.cancel() で止めるのではなく、speechSynthesis.pause() で一時停止して後から再開(speechSynthesis.resume())できるとさらに良さそうです。

二番目の問題については const screen = document.getElementById("movie_player"); を行っている部分をURL等を確認してから行うように変更することで回避できると思います。三番目については実現できると嬉しそうですが、そこまで頑張らなくても良いと考えています。

一番目の問題はできれば直しておきたいです。私はこの問題の解消には実装の方法を少し考えたほうが良さそうに考えています。

アイデアとしては、wasPaused で停止されているかどうかの管理をするのではなく、prevSpeakTime を初期化して発話が行われていなかった事にする形で実現することができそうな気がしています。具体的には、お寄せいただいた pull request の552行目

https://github.com/limura/CaptionSpeaker/pull/8/commits/2f2e849657a5430f4720073e444ceacea39a345a#diff-22f1e79483087db4b872e00c5e454ed5d5dc28a26b761d05239c0a981951acadR552

の wasPaused = true; の部分を prevSpeakTime = ""; に変更するような形です(恐らくこの修正だけでは正常には動作しないと思います)。

以上のような点を考えてみましたが、改善はできるでしょうか。難しそうであればこちらで作業しても良いです。

RonnieBlade commented 2 years ago

Thank you for the prompt review! Working on that

limura commented 2 years ago

素晴らしい！期待しています。(´ω｀)

RonnieBlade commented 2 years ago

Hi Limura! I've changed the implementation and fixed the bugs. Regarding to your notes:

If pause the video, change the playback position while it is stopped, and then resume playback, it works correctly now.
I tested this new feature not only on YouTube but with embedded videos too. Anyway it's crucial to make the css selector more precise. Which I made in the next commit ("Precise screen css selector").
Now I pause CaptionSpeaker's speech with speechSynthesis.pause() as you offered, it's indeed far better implementation. Also I resumed with speechSynthesis.resume() if playback position was not changed while it was paused.

RonnieBlade commented 2 years ago

I detected that if you pause the video, then play a video in another tab, it continues speaking the remaining part of a speech from the previous video. I have fixed it in the last commit. It seems that everything works as expected now.

limura commented 2 years ago

変更ありがとうございます。 .pause() を実装したのですね。素晴らしい仕事だと思います。

ただ、全体的に変更箇所が多くなっているために問題が出てきてしまっているようです。以下で詳しく述べますが、私が同様の機能を実装し直したほうが早そうな気がしています。

まず、実装上の事について私が気になった点は以下のような所です。

.resume() した後に .speak() している箇所があります。場合によっては発話が重複するかもしれません。安全のため、.resume() した後は return; した方が良さそうです。また、直前の .cancel() が誤動作を誘発しないかも心配なので、SpeechSynthesis の仕様を確認しておいたほうが良さそうな気がしています。(具体的には https://github.com/limura/CaptionSpeaker/pull/8/commits/1b869099a1cc8e707ec3e37851920fcdc5a9956e#diff-22f1e79483087db4b872e00c5e454ed5d5dc28a26b761d05239c0a981951acadR332 の辺りの箇所です)

次に、以下に述べる項目は私が pull request を受け付ける前に提示しておくべき事であったので申し訳ないのですが、プログラムコードの書き方について、注意して頂きたい事項になります。

新しい global変数が多く定義されています。これらは基本的には推奨されません。なくせる物はなくして下さい。正確な事は言えませんが、例えば paused, videoStreamElement, screen, videoUrl 辺りは無くせるような気がしています。
関数を定義する場合、関数の内部で外部の変数を参照しないように実装するよう努力してください。例えば getVideoUrl() は videoStreamElement という外部変数に依存しています。この場合は引数に videoStreamElement に当たるものを定義する事で回避すると良いと考えています。
行末のセミコロンが無い行が多いです。これは推奨されません。
ifステートメントのブラケット({})が無い物があります。これは1行で ifステートメントを記述する場合のみ利用してください。(具体的には https://github.com/limura/CaptionSpeaker/pull/8/commits/1b869099a1cc8e707ec3e37851920fcdc5a9956e#diff-22f1e79483087db4b872e00c5e454ed5d5dc28a26b761d05239c0a981951acadR591-R594 の辺りに問題があります)
globalに定義される関数名の最初の文字が小文字になっているものがあります。これらは大文字で開始して関数名であることを認識したいです。
変数名の命名について。1文字や数文字の意味をなさない文字列は避けて下さい。例えば cls ではなく className などとします。

実装上の問題については対応は簡単かと思いますが、プログラムコードの書き方の問題についてはかなりの修正が必要になりそうです。これらは私が行う場合、この pull request を参考に初めから書く事になりそうです。それでも良いのであればこちらでこの作業を私が引き受けます。貴方はどうしたいですか？

RonnieBlade commented 2 years ago

Hi! Thank you very much for these reasonable recommendations! To me it would be better if I finally manage these problems in this pull request. If you don't mind, I'll try to do that

limura commented 2 years ago

お返事ありがとうございます。こちらでプログラムコードの書き方についてのガイドラインを用意できていればよかったと思っています。ガイドラインを用意できていないため、場合によっては別の指摘をすることになるかもしれませんが、それでもよろしければ引き続き作業をお願いしたいです。お手数をおかけしてしまいますが、よろしくお願いいたします。

あと1点、使っていて気づいた点がありました。以下に示します。

別のタブ等でページ遷移を起こすと SpeechSynthesis での発話が停止するようです。これは別の Extension等で用いている SpeechSynthesis にも影響を与えています(実際の所、別の SpeechSynthesis を使う Extension を利用している時にこの問題に気づきました)。詳しく調べていないので正確な事は言えませんが、恐らくは https://github.com/limura/CaptionSpeaker/pull/8/commits/1b869099a1cc8e707ec3e37851920fcdc5a9956e#diff-22f1e79483087db4b872e00c5e454ed5d5dc28a26b761d05239c0a981951acadR560 でページ表示時に無条件で .cancel() を行っている事が原因のような気がしています。

limura commented 2 years ago

@RonnieBlade 改良は進んでいますでしょうか？少し時間がかかっているようなので心配しています。

もし、実装するのが難しいのであれば、こちらで作業致しますのでお知らせ下さい。

RonnieBlade commented 2 years ago

Hi Limura! Sorry it's been a while, I needed some time to investigate the issue. I've made some changes. I'll commit today

RonnieBlade commented 2 years ago

Here is what I’ve changed in the new commit:

MutationObserver is now observes the subtree filtering class and src attributes, so some global variables and functions are not needed anymore.
Global variables videoUrl, screen, videoStreamElement are removed.
Global functions getVideoElement(), getVideoUrl(), getCurrentTime() and ifScreenClassesContain() are removed. A new function InitializeScreenObserver() is added.
Checking if YouTube player is presenting on the page before performing speechSynthesis.cancel() on page load.
I’ve checked the speechSynthesis documentation as well as developers forums, speechSynthesis.speak() after speechSynthesis.resume() should not be a problem. Utterances should not overlap.
Code is formatted the way you had recommended: semicolon are added at the end of the lines, all if statements have curly braces {}, Global function names start with capital letters.

Hope, everything is fine now.

limura commented 2 years ago

ありがとうございます。素晴らしい仕事です。

いくつか気になった点はあったのですが、概ね問題ないと思いますので取り込みました。

重ねて、ありがとうございます。大変助かりました。

なお、気になった点は以下の点でした。これらはこちらで検証して修正が必要そうであれば修正することにしようと思っています。

undefined の場合がある変数をそのまま使っている箇所がある。例: https://github.com/limura/CaptionSpeaker/pull/8/commits/67c3fb28ad4242ea5e83820d7f581e02978ef78a#diff-22f1e79483087db4b872e00c5e454ed5d5dc28a26b761d05239c0a981951acadR331
paused は speechSynthesis.paused を参照する事では代用できないのかを確認しておきたい。
speechSynthesis.cancel() が多用されている。speechSynthesis.cancel() を発行してしまうと他のタブで動作している他のExtension等での SpeechSynthesis.speak() が停止してしまうため、できるだけ speechSynthesis.cancel() は発行させないようにしたい。具体的には speechSynthesis.speaking でなければ .cancel() は呼ばないような物で wrap することで対処できないかを検討する。

RonnieBlade commented 2 years ago

Hi! Thank you very much! It's very interesting and inspiring to work on your project.

You're absolutely right, I forgot to check if screen isn't null.
Sounds like a good idea, we can try it out.
Yes, you can do it checking if speechSynthesis.speaking is true . I've just tested it, this approach works fine, but it's not suitable for all situations, for example when pausing a video, rewind or fast-forward and hit play, we need to use cancel to prevent speechSynthesis from speaking an utterance from a different time of the video.

limura / CaptionSpeaker

Implemented speechSynthesis auto-stop on YouTube player pause (FIXED) #8