hyperaudio / ha-converter

Hyperaudio Converter - converts from JSON/SRT to HTML Based Interactive Transcript
https://hyperaud.io/converter/converter.html
13 stars 11 forks source link

Google STT converter #8

Closed overZellis133 closed 4 years ago

overZellis133 commented 4 years ago

www.theirstory.io is a web-based interviewing & storytelling platform to help facilitate, record, transcribe, index, digitally archive, and share meaningful stories and conversations. Today, TheirStory allows users to generate a transcript of an audiovisual recording by using Google's STT services. That transcript gets added to a Word Document, which the user can download.

We would like to be able to translate the data generated by Google STT, and in addition to generating a Word Document, pipe the Google STT data into Hyperaudio Lite. We would then look to integrate Hyperaudio Lite into TheirStory so that we can allow users to generate their own interactive transcripts associated with their audiovisual records within TheirStory.

@maboa is this something you can help with?

maboa commented 4 years ago

@overZellis133 Yes - I can :) Do you have a couple of examples of the JSON that Google's STT generates?

overZellis133 commented 4 years ago

@maboa Yes, I will get this to you soon.

overZellis133 commented 4 years ago

@maboa is there a way to upload JSON file here? I've attached the JSON in a Word doc, but I feel as though there has to be a better way to share this with you... BethEl_TheirStory_Interview_JSON.docx

overZellis133 commented 4 years ago

@maboa, happy to provide a JSON file for other transcripts if that would be helpful. Just let me know.

maboa commented 4 years ago

@overZellis133 I downloaded the JSON - but there seems to be an issue with the formatting. As the keys are not wrapped with double quotes as is usually the case https://stackoverflow.com/questions/949449/do-the-json-keys-have-to-be-surrounded-by-quotes

Not sure if they got stripped out somewhere in the process?

overZellis133 commented 4 years ago

theirstory_transcript.docx @maboa I uploaded a new file where the keys are in quotes

I was originally using JSONview as a Chrome extension, which stripped out the quotes.

maboa commented 4 years ago

OK @overZellis133 I have added an option for Google STT JSON conversion now, at least it appears to work with the test data you supplied. It does not as yet split into paragraphs based on time elapsed between words. It would make sense to open that request as a separate issue though.

overZellis133 commented 4 years ago

Awesome. Thanks @maboa. I just filed the new issue.