pietrop / digital-paper-edit-electron

Work in progress - digital paper edit project - Electron, Cross Platform Desktop app - Mac, Windows, Linux
https://pietropassarelli.net/autoedit
Other
103 stars 17 forks source link

'Import a transcirpt' feature #81

Closed siddie closed 10 months ago

siddie commented 2 years ago

Hey @pietrop. You are doing great work!

Maybe I am missing something but is it possible to have an import from a transcript file - as an alternative option to importing media and togging transcription on the server?

My use case is that I have media and raw transcripts done somewhere else and I want to import them as if bypassing transcription.

pietrop commented 2 years ago

Hi @siddie Not currently. There's however an advanced feature in the transcript editor where you can replace STT transcription with some transcription text you might have and it will transpose/re-align the timecodes. In this version it won't preserve speaker labels, but it will preserve paragraph breaks.

Screen Shot 2021-08-05 at 5 37 43 PM

One advantage of using STT services, is that they provide word level time-codes, which gives higher accuracy when making selections for paper-editing.

What kind of format do your transcription come in? and would they have time-codes associated with it?

siddie commented 2 years ago

Hi @pietrop. Sorry, I forgot to mention, transcripts are time-coded: mostly SRT and VTT.

I want to use the tool for paper editing the material that has already been transcribed with time codes (and for refining those transcriptions as well). Thus I need to import the audio/video material together with the time-coded transcripts and then use the standard flow the app provides.

pietrop commented 2 years ago

Hi @siddie We had this feature in autoEdit2.

In your srt/vtt how do you handle speaker names/labels?

siddie commented 2 years ago

Hi @pietrop!

In my experiments, speakers are only introduced on the Paper Edit level. Srt/vtt is just a raw input. Actually, your mentioning of speakers got me thinking that maybe it is worth allowing importing of other STT formats as well, like Amazon, Google or even Kaldi.

I do not track all the family of the projects around Paper Edit, but why I am asking about the feature is that I saw the import option (and thus the code for that) in this one: https://bbc.github.io/react-transcript-editor/iframe.html?id=demo--default

It seems that the import option could save some time and money (in case of using paid STTs) but this is only a hypothesis to be proven. Solely from my side, having the import option allows some needed flexibility.

pietrop commented 2 years ago

Hi @siddie

Ideally when integrating with other STT services, in an app like autoEdit, the ideal setup is when that is abstracted away from the user, eg you setup some credentials, and then autoEdit handles getting a transcription from those users. Unfortunately AWS and Google don't over a straight forward API, like AssemblyAI or how speechmatics used to do. You have to integrated much more closely with their services like S3 buckets, cloud storage etc.. and can get pretty complicated pretty quickly for non developers if it's not abstracted away etc.. It's also easier to integrate with them in a web app than it is in a desktop app.

As you might know I worked on the BBC react transcript editor, and yes, we did have all the adapters there to encourage developer community contribution. autoEdit used to use that one, but In the latest transcript editor used by autoEdit, the one based on slate, the adapters are still present but available as individual modules, for developers to choose which once they need to integrated with their system, see this list

In theory, if one had the time, all the building blocks would be there to support optional import of the transcript at the creation/ingestion stage in autoEdit.

For srt/VTT where speakers might not be provided, the challenge is how often to introduce paragraph breaks, eg whether to just respect the "natural" two line break of an srt file or whether to use punctuation to make slightly bigger (and perhaps meaningful) paragraphs? In the slate transcript editor in autoEdit, these paragraphs can then be adjusted (split, join etc..) as well as receive a speaker label etc.. so it's just a matter of which option will create less work when adjusting/correcting the transcript, if that makes sense?