Support for reading subtitles in videos, i.e. continuous and refreshable OCR

nvaccessAuto commented 11 years ago

Reported by nvdakor on 2012-11-15 06:59 Hi, A number of users use a screen reader which reads subtitles on videos (for example, language subtitles for foreign film videos). Currently, NVDA does not read subtitles present in some videos, so would it be possible to allow NVDA to read them? Thanks.

nvaccessAuto commented 11 years ago

Comment 1 by jteh on 2012-11-21 01:32 Please provide an example URL or application. The way subtitles are implemented differs greatly between video players. It'd also be good to know what screen readers do this.

nvaccessAuto commented 11 years ago

Comment 2 by nvdakor on 2012-11-21 11:06 Hi, Just asked some users. They told me that they're using a Korean screen reader called Sense Reader. They are also using GOM player, as this is the only media player that people can listen to subtitles using Sense Reader. In addition, users suggested adding subtitle reading support for VLC media player as well. Thanks.

nvaccessAuto commented 11 years ago

Comment 3 by jteh on 2013-02-10 22:48 We can't add support for this unless the player exposes them to accessibility or native APIs.

nvaccessAuto commented 11 years ago

Comment 4 by vortex on 2013-04-12 23:56 I wrote a program which can do this for external subtitle files, and it is compatible with NVDA. You cand find more at: my website

surfer0627 commented 7 years ago

A company named CatchPlay, use Brightcove's Video Cloud service. It seems that NVDA could report captions. Here is a 5 minutes demo.

notes:

Captioning, is more commonly used as a service to aid deaf and hearing-impaired audiences. They are more adaptable to live broadcasts, such as news broadcasts, sports events and television shows broadcast live. Usually, captions (also called closed captions) appear as white text within a black box, appearing a second or two after being spoken.
Brightcove: provide Cloud Video platform.
CatchPlay: provide streaming service in Taiwan, Singapore, and Indonesia.

LeonarddeR commented 7 years ago

cc @JosephSL

ehollig commented 7 years ago

CC @josephsl for further clerification

mirovg commented 5 years ago

GOM Player: F5 - Others - Accessibility - Turn on "Outputing subtitles as window titles for the voice output program". Works with NVDA. Sometimes dont work at first so I try it 2 - 3 times again, but finaly it woks corretly.

fernando-jose-silva commented 5 years ago

I've heard voice over demos for iphone reading subtitles on netflix. I do not know how nvda behaves on netflix, unfortunately I am not a user of this movie platform. But if the voice over can, if netflix provides the same features for the web interface or some application for windows 10 nvda could also offer this functionality.

OzancanKaratas commented 4 years ago

How can we support the Netflix website for this issue?

Adriani90 commented 4 years ago

@florianionascu7 a romanian guy developed an addon for reading subtitles. Do you know hit profile on Github? Maybe he can contribute here by raising a pull request.

florianionascu7 commented 4 years ago

Yes, I know his Github profile. It's: @vortex1024

vortex1024 commented 4 years ago

Hello, I am not sure this would be accepted, since it doesn't fit in basic screen reader functionality. also, the results vary greatly depending on the quality of the video and computer performance. Any NVDA developers comments about this?

Adriani90 commented 4 years ago

In my view such a feature, if it is optional, it fits very well into screen reading basics. In fact it would be something like reading live regions. Obviously the screen reader should read the text that appears on the screen which is visible for a sighted person. So why not reading sub titles? cc: @feerrenrut, @michaelDCurran would you accept a coresponding pull request for such a feature?

@vortex1024 could you please post a link to the github repository of the addon?

feerrenrut commented 4 years ago

I think it depends on the implementation, and whether it will make maintenance harder. @vortex1024 maybe you can give us an overview of the approach?

vortex1024 commented 4 years ago

Sure. basically, what I'm doing is I run the win 10 OCR in a while loop, in a separate thread, with a configured sleeping interval. The area to scan can be full screen, focus, foreground or navigator object. IN full screen, I added options to percentually crop zones from the screen, both to make recognition faster and easier, and to remove certain bits of the screen to be recognised, such as the TV logo or current time of the video. I've been recently asked to make the cropping options available for the foreground object too, people seem to use it for games as well.

In order not to read the same text again and again, I use the included difflib.sequence_matcher python class to determine whether the text has changed. I haven't published the code, as I don't think it is polished enough yet, but, if there's interest for a pull request, I can try and reorganise it to the best of my abilities, then improve it based on comments. Thanks

OzancanKaratas commented 4 years ago

I think we should write a handler for the subtitle files. It finds the subtitles in the videos using the video time. I think that the OCR operation will damage the content of subtitles.

It can also be considered to write a kernel mode driver such as video intercept.

Adriani90 commented 4 years ago

There is a discussion on the lion addon also here: https://forum.audiogames.net/topic/33489/lion-nvda-universal-subtitle-reader-and-more/

@vortex1024 gets lot of positive user feedback for this addon, and there came up several use cases for this addon:

In movies with subtitles on Youtube, netflix, VLC Mediaplayer etc. Very useful also for blind deaf people and people watching movies and documentaries in multiple languages
In mainstream games (i.e. people manage to play Fifa and lots of other games with this addon)
In online conferences where people share live presentations and possibly other areas.

Current limitations:

Creating profiles for different websites, games, apps etc is not possible. This problem would be solved if this addon was part of NVDA
Text recognition is not 100% accurate due to OCR limitations. But this could be solved if GDI calls could be applied (i.e. also using display model). For this the help of experienced NVDA developers is needed
Selected text cannot be recognized by the addon, so the speech is not reacting as fast as the cursor moves and stops speaking when you move between text pieces. This is because in most cases in games the text is drawn on the screen and highlighted with a certain color if it is selected, probably related to display model. I think @leonardder has proposed some improvements to display model code in NVDA. The improvements could be applied here as well I think
When subtitles appear under eachother the addon stops working. It seems the addon waits until one piece of text disappears and another one appears. This could also be solved by applying GDI calls or by improving the OCR calls in the code.

Maybe @jcsteh and @michaelDCurran are also interested to contribute to this. I think this feature would open a large new area for blind and visually impaired users.

Adriani90 commented 4 years ago

The first beta version of this addon is here: http://vortex.go.ro/api/download/get/1

@vortex1024 if you could upload it to your github file along with your source code files, we could have a better impression on how the addon works. Thanks for your great work sofar.

Qchristensen commented 2 years ago

Just a quick note that I had a call from a user today requesting this functionality - they cited the two examples of video games with text, where stopping to run OCR and reading the results is impractical, and also videos with subtitles embedded into the video (and thus not readable regularly).

cary-rowen commented 2 years ago

Hi,

Can NV Access take a look at this OCR project? It has been used in some screen readers in China (including mobile screen readers), and according to my tests, it works well.

https://github.com/PaddlePaddle/PaddleOCR

jcsteh commented 2 years ago

I'd suggest filing a separate enhancement request issue for that. Continuous OCR is independent of the OCR engine.

cary-rowen commented 2 years ago

OK, I will do it.

Adriani90 commented 1 year ago

The new whisper.cpp from Open AI is quite successful in creating subtitles in many languages from Videos and Audios, and it might be really interesting to look in the potential for NVDA, becoming more independent from external sources such as OCR or automatic generated subtitles from Youtube etc. Here is a GUI to use the whisper.cpp in Python. https://www.reddit.com/r/Python/comments/12kyfl4/i_made_a_simple_gui_to_use_whispercpp_in_python/

cc: @jcsteh, @leonardder

Adriani90 commented 1 year ago

The GUI seems to work offline without any API, so I wonder if this would even work without internet connection?

nvaccess / nvda

Support for reading subtitles in videos, i.e. continuous and refreshable OCR #2797