Closed jsauvageau closed 4 years ago
Hi there. Thanks for the kind words.
I can look into this. Do you have a sample file I could examine?
Thanks
Hi,
I have attached this .vtt file (don't be mistaken by the .txt extension as I had to rename it that way to be able to upload it as .vtt is not supported file by GH). asset-3433-en-US.txt
You will see that it is pretty much a plain text file, but it's formatting is what makes it possible to display voice-to-text on screen (i.e. closed captioning in YouTube or other players) while a video is playing.
AKA WebVtt, the specs can be found here: https://en.wikipedia.org/wiki/WebVTT.
Let me know if you need anything else.
Thanks and have a nice day!
Agreed VTT support would be very helpful.
Hi Bob, It works great, as easy as the other formats! The only issue I see, something that didn't caught my attention previously, is that several of the phrases in the output are very long, so long that they would occupy a lot of the screen real estate when visioning a video, and for several seconds - for example, I have a 37-seconds long segment, which contains 612 characters. I thought it was in the Wikipedia specs that I sent you but I can't find it there: I know that I read somewhere that the recommended maximum length of the phrases in vtt is 80 characters, 70-ish likely the best. Is this (length of phrases) something you have control over? Thx!
Ah, I see, yes. In 1.3.1, I’ve set a threshold of 80 characters to start utilising ‘lines’. This should be the fix... I hope?
Hi again. Thanks so much for implementing VTT support. I believe I'm using 1.3.1 now (pip freeze shows 'tscribe==1.3.1') but still seeing very long lines in captions. Attaching an example here: transcript11111w21w1.vtt.txt
Appreciate any help you can provide. I work at a University and we're experimenting with AWS transcribe for accessibility purposes.
Hi. My pleasure.
Is your output the same when using docx
or csv
format?
I have given the intro to the podcasts listen.
It looks to me like an accurate conversion fro json
to vtt
... I think the issue might be with the quality of output from AWS Transcribe. This package merely converts the json
...
Hi again - maybe I misunderstood what the 'lines' function does. I thought it would break up a section of text into multiple captions if it extended beyond the 80 character limit. For instance, in the example file I provided, there's a line like this:
00:00:11.000 --> 00:00:31.000 hotel people on a regular basis. There is still a land of opportunity in America. It's called Texas. The problem is these departures from the Constitution. They have become the norm. At what point must a female senator raised her hand or her voice to be recognised over the male colleagues in the
Ideally, it would break it up this caption into multiple captions (while maintaining accurate timestamps) into something like this:
_00:00:11.000 --> 00:00:15.000 hotel people on a regular basis. There is still a land of opportunity in
00:00:16.100 --> 00:00:20.000 America. It's called Texas. The problem is these departures from the
00:00:20.100 --> 00:00:25.000 Constitution. They have become the norm. At what point must a female senator
00:00:26.100 --> 00:00:31.000 raised her hand or her voice to be recognised over the male colleagues in the_
(I'm just guessing on the time stamps by the way to provide an example)
Hi @djjacobs1979
I understand your comments.
The aws transcribe service dictates the size of these segments (or chunks of text), which are then written as a bunch of lines by tscribe.
Does the application displaying your subtitles iterate individually over the lines during playback? So as not overlap or fill the screen?
I am reluctant to separate these into further segments as I see this as changing the source material and may result in unforeseeable consequences.
One would hope that AWS transcribe being a ML product, would develop and improve over time and perhaps the segments may eventually provide a more accurate likeness or more suitable breakdown.
I’m afraid for now, may advise may be to consider another tool like aws-transcribe-to-srt, or to fork this repo developing it further to suit your use case.
Sorry I can’t be of much more help, I hope you understand my rationale.
Regards
Bob
Hi Bob,
No problem at all. I think we may end up forking your project to create a more customized VTT caption output, as there are other features which would get us closer to how we normally format caption files.
Again, thanks, your work is impressive and incredibly helpful.
Brilliant. Thanks for getting involved and good luck!
Hi, of the various solutions I've explored to convert aws json files to more "human readable" results, this is the best by far - so thanks a lot! However, I also need a solution to convert either the json file itself or the resulting converted files (i.e. csv) into a vtt file. I found other solutions on GitHub but most are java or C## and I'm not a programmer so the learning curve seems quite steep to get to the results I'm expecting. I found another older (2016?) Python project that claims to do just that (richwine/watson-caption-converter), but although I'm replacing the name of the "watson,json" file in the .py, I'm getting an error re: the json file not being the right format... I was wondering if that's (adding a convert to vtt option) something you'd consider doing as the rest of your tool is so convenient and easy to use for a newbie like me?
Thx for considering and keep up the good work.