octimot / StoryToolkitAI

An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models
GNU General Public License v3.0
724 stars 60 forks source link

Customize Transcription Output #161

Open clpStephen opened 9 months ago

clpStephen commented 9 months ago

Is your feature request related to a problem? Please describe. We submit transcriptions of interviews and clips to studio producers. They have a very specific format that they want. I can potentially achieve this by taking the srt file and having another ai reformat it for me, but that's a lot of tedious extra steps. Is it possible to reconfigure the transcription provided by StoryToolkit?

Describe the solution you'd like I would like a config file or settings within the app. Here's an example of what I mean: In app, the transcription looks like this:

Speaker 2 It's fine. Speaker 3 Pick a roll, baby girl.

I need to be able to output that to something like this:

[011824_JOE INTV] 13:51:51 WOMAN 2: It's fine.

[011824_JOE INTV] 13:51:55 INTERVIEWER: Pick a roll, baby girl.

Every segment has the name of the file, the timecode, and the transcript.

If I output to text for Avid, it doesn't include timecodes or anything, just the transcript with line breaks, not even speakers. Also, I would like to be able to rename identified speakers for use in this output.

Describe alternatives you've considered The workaround would be to feed the file to an ai text tool and have it reformat to what I need.

Additional context Sorry if this is addressed elsewhere already. Also maybe the assistant could do this. I can't get it to work, it just says it is having trouble connecting. I do have an API key.

Thanks!

octimot commented 9 months ago

Hey there!

First, since this is pretty much standard reformatting, I think a custom transcription output is the way to go. I'm going to try to see how to best approach this since - as you also pointed out - a lot of studios / post houses actually have unique preferences when it comes to this, and coding it universally for everybody wouldn't make too much sense. But allowing anyone to make up their own custom export templates might make more sense...

To address some of the issues that you also mentioned:

If I output to text for Avid, it doesn't include timecodes or anything, just the transcript with line breaks, not even speakers.

If you're referring to AVID DS exports, I think this is due to the standard format. Apart from adding the speaker in front of every line, there's not much we can change I think.

Also, I would like to be able to rename identified speakers for use in this output.

You can OPTION/ALT + click on the transcript or right-click -> Edit and rename the detected speakers. You can also use CMD/CTRL+F to find and replace speaker names in bulk. Or, is there something not working correctly on your end?

Also, I would like to be able to rename identified speakers for use in this output.

The Assistant should be able to do this, especially when using GPT-4, but since this is solvable by a simple formatting algorithm, I think using AI for the task is an overkill (and costly, depending on the amount of transcripts you deal with)...

Also maybe the assistant could do this. I can't get it to work, it just says it is having trouble connecting. I do have an API key.

Just to make sure, if you have an OpenAI key, it needs to be entered in Preferences -> Assistant -> OpenAI API Key More details here.

Cheers!

clpStephen commented 9 months ago

Thanks Octimot! I'll look for a solution to get it quickly reformatted in the meantime. GPT deifnitely can do it but the character limit impedes me. I'll re-enter my api key to see if that corrects the issue. I'm beginning with the transcription features first as that can cure a lot of pain. We are primarily an Avid house although we do have some Resolve and one show that uses Premeire. I look forward to seeing how else we can benefit from your tool.

octimot commented 9 months ago

I'll push an update on Github that allows the creation of custom transcription exports sometime next week or maybe sooner...

For the particular use case that you mentioned, the template you'd need to create would probably look like this:

name: Custom Export Template

extension: txt

segment_template: |
  [{transcription_name}]
  {segment_start_tc}
  {segment_speaker_name}: {segment_text}

segment_separator: "\n\n"

Once this is saved in a .yaml file in templates/transcription_export you'll be able to export exactly in the format you need.

Question: are you using the git version of the tool, or the standalone?

clpStephen commented 9 months ago

Thanks so much for that! I am using the standalone on a windows 10 workstation. The version that plugs into Resovle may be beneficial down the line. I'll answer that question once I start working with my AEs on this to see what they think. I want to get it functional for us first though. I'll try your template!

octimot commented 9 months ago

Both the git and the standalone versions should connect to Resolve Studio.

It would be great if you could attempt a git installation because you'll be able to access the update I was mentioning faster (as soon as I push it to Github)!

I'll come back on this issue when it's up and ready.

Cheers

clpStephen commented 8 months ago

I'm working on the git, I am having some issues. I guess I should be on python 3.10 rather than 3.12?

octimot commented 8 months ago

Yes, some of the packages that the tool is using are not tested or not compatible with anything newer than 3.10.

clpStephen commented 8 months ago

Ok, I have StoryToolkitAI GIT launchable.

octimot commented 8 months ago

I just pushed version 0.24.1 which includes custom transcription export templates (commit https://github.com/octimot/StoryToolkitAI/commit/bb2011ae0ad4e5669d723faf9dddfe59e805f7bf)

Just update the tool and try to add the custom template that I recommended above in the templates/transcription_export folder in your StoryToolkitAI configuration folder. To find the configuration folder on your machine, open StoryToolkitAI and go to File -> Open Configuration Folder

Full instructions for how to work with custom export templates here.

Please let me know if the templates work on your end.

Cheers!

clpStephen commented 8 months ago

Will do, Thanks!

clpStephen commented 8 months ago

Thanks so much for your attention, Octimot! I have it working pretty well now. It is still struggling with speaker detection so I'm playing with models and settings to try to fine tune that. An issue I've come across with this process is that if I generate a transcription and Detect speakers via the Ingest, I get a valid transcription from my custom template yaml. But if I try to change some settings and run Detect Speakers on the json file that was already built, I lose the content when exporting a new transcription. It just gives me the header. I'll attach those here but please let me know ifthis should be an entirely new issue or if this is a good thread for this. I do have the workaround of it working the first time it's generated although it does occasionally fail the Speaker ID for some reason. 012124.txt CLP_trans.txt note- it wouldn't allow me to attach yaml so i changed extension to txt. yaml is the CLP_trans file.

octimot commented 8 months ago

As far as I can tell, you should remove the conditional with the speaker, see below:

segment_condition: |
  not {segment_meta}
  not {segment_meta_speaker}
  not {segment_meta_other}
  '{segment_speaker_name}' == 'Speaker 1'.   <------- REMOVE THIS ENTIRE LINE

What that does is it tells the export function to only export segments that have "Speaker 1" as the speaker name.

Cheers!

BristolBEAT commented 7 months ago

I've created a WAV in Avid with timecode. Imported into Avid and Resolve to check it exists which is does.

However the timecode doesn't seem to carry across to StoryToolKitAI? H264s are fine.