kibaffo33 / aws_transcribe_to_docx

Produce Word Document, CSV or SQLite transcriptions using the automatic speech recognition from AWS Transcribe.
MIT License
163 stars 64 forks source link

Converting to vtt? #10

Closed jsauvageau closed 4 years ago

jsauvageau commented 4 years ago

Hi, of the various solutions I've explored to convert aws json files to more "human readable" results, this is the best by far - so thanks a lot! However, I also need a solution to convert either the json file itself or the resulting converted files (i.e. csv) into a vtt file. I found other solutions on GitHub but most are java or C## and I'm not a programmer so the learning curve seems quite steep to get to the results I'm expecting. I found another older (2016?) Python project that claims to do just that (richwine/watson-caption-converter), but although I'm replacing the name of the "watson,json" file in the .py, I'm getting an error re: the json file not being the right format... I was wondering if that's (adding a convert to vtt option) something you'd consider doing as the rest of your tool is so convenient and easy to use for a newbie like me?

Thx for considering and keep up the good work.

kibaffo33 commented 4 years ago

Hi there. Thanks for the kind words.

I can look into this. Do you have a sample file I could examine?

Thanks

jsauvageau commented 4 years ago

Hi,

I have attached this .vtt file (don't be mistaken by the .txt extension as I had to rename it that way to be able to upload it as .vtt is not supported file by GH). asset-3433-en-US.txt

You will see that it is pretty much a plain text file, but it's formatting is what makes it possible to display voice-to-text on screen (i.e. closed captioning in YouTube or other players) while a video is playing.

AKA WebVtt, the specs can be found here: https://en.wikipedia.org/wiki/WebVTT.

Let me know if you need anything else.

Thanks and have a nice day!

DJJacobs commented 4 years ago

Agreed VTT support would be very helpful.

kibaffo33 commented 4 years ago

Support for VTTs now added 1.3.0 b7d551eaafd967e8ba3a0c21ba6ba0010ea47914. Does this work for you?

If you are using this package as part of a monetized product, please consider supporting the package by becoming a patreon. Thank you.

jsauvageau commented 4 years ago

Hi Bob, It works great, as easy as the other formats! The only issue I see, something that didn't caught my attention previously, is that several of the phrases in the output are very long, so long that they would occupy a lot of the screen real estate when visioning a video, and for several seconds - for example, I have a 37-seconds long segment, which contains 612 characters. I thought it was in the Wikipedia specs that I sent you but I can't find it there: I know that I read somewhere that the recommended maximum length of the phrases in vtt is 80 characters, 70-ish likely the best. Is this (length of phrases) something you have control over? Thx!

kibaffo33 commented 4 years ago

Ah, I see, yes. In 1.3.1, I’ve set a threshold of 80 characters to start utilising ‘lines’. This should be the fix... I hope?

DJJacobs commented 4 years ago

Hi again. Thanks so much for implementing VTT support. I believe I'm using 1.3.1 now (pip freeze shows 'tscribe==1.3.1') but still seeing very long lines in captions. Attaching an example here: transcript11111w21w1.vtt.txt

Appreciate any help you can provide. I work at a University and we're experimenting with AWS transcribe for accessibility purposes.

kibaffo33 commented 4 years ago

Hi. My pleasure.

Is your output the same when using docx or csv format?

I have given the intro to the podcasts listen.

It looks to me like an accurate conversion fro json to vtt... I think the issue might be with the quality of output from AWS Transcribe. This package merely converts the json...

DJJacobs commented 4 years ago

Hi again - maybe I misunderstood what the 'lines' function does. I thought it would break up a section of text into multiple captions if it extended beyond the 80 character limit. For instance, in the example file I provided, there's a line like this:

00:00:11.000 --> 00:00:31.000 hotel people on a regular basis. There is still a land of opportunity in America. It's called Texas. The problem is these departures from the Constitution. They have become the norm. At what point must a female senator raised her hand or her voice to be recognised over the male colleagues in the

Ideally, it would break it up this caption into multiple captions (while maintaining accurate timestamps) into something like this:

_00:00:11.000 --> 00:00:15.000 hotel people on a regular basis. There is still a land of opportunity in

00:00:16.100 --> 00:00:20.000 America. It's called Texas. The problem is these departures from the

00:00:20.100 --> 00:00:25.000 Constitution. They have become the norm. At what point must a female senator

00:00:26.100 --> 00:00:31.000 raised her hand or her voice to be recognised over the male colleagues in the_

(I'm just guessing on the time stamps by the way to provide an example)

kibaffo33 commented 4 years ago

Hi @djjacobs1979

I understand your comments.

The aws transcribe service dictates the size of these segments (or chunks of text), which are then written as a bunch of lines by tscribe.

Does the application displaying your subtitles iterate individually over the lines during playback? So as not overlap or fill the screen?

I am reluctant to separate these into further segments as I see this as changing the source material and may result in unforeseeable consequences.

One would hope that AWS transcribe being a ML product, would develop and improve over time and perhaps the segments may eventually provide a more accurate likeness or more suitable breakdown.

I’m afraid for now, may advise may be to consider another tool like aws-transcribe-to-srt, or to fork this repo developing it further to suit your use case.

Sorry I can’t be of much more help, I hope you understand my rationale.

Regards

Bob

DJJacobs commented 4 years ago

Hi Bob,

No problem at all. I think we may end up forking your project to create a more customized VTT caption output, as there are other features which would get us closer to how we normally format caption files.

Again, thanks, your work is impressive and incredibly helpful.

kibaffo33 commented 4 years ago

Brilliant. Thanks for getting involved and good luck!