add central logging text file with timestamps

hwc0x01 commented 9 months ago

Is your feature request related to a problem? Please describe. a central transcription log file for all transcripts to allow for searching across outputs. allows for quick identification of relevant video/audio segments for reference.

Describe the solution you'd like change following code in mots_whisper.py to add simple write to txt

 # write transcription with timestamps to txt file open file here
            transcription_log = open("../../TranscriptionLog.txt", "a")

            if verbose:
                for segment in current_segments:
                    start, end, text = segment["start"], segment["end"], segment["text"]
                    line = f"[{format_timestamp(start)} --> {format_timestamp(end)}] {text}"
                    print(make_safe(line))
                    transcription_log.write("%s\n" % line)

                    # add the text to the queue item variable (to make it available in the UI)
                    if queue_id is not None:
                        toolkit_ops_obj.processing_queue.update_output(queue_id=queue_id,
                                                                       output=make_safe(segment["text"]))

            transcription_log.close()

change following code in toolkit_ops.py to add simple write to txt

  # write transcription with timestamps to txt file open file here
        transcription_log = open("../../TranscriptionLog.txt", "a")

        # let the user know the transcription process has started
        if isinstance(time_intervals, list):
            time_intervals_str = ", ".join([f"{start}-{end}" for start, end in time_intervals])
            debug_message = "Transcribing {} between: {}.".format(name, time_intervals_str)
            transcription_log.write("%s\n" % debug_message)    
        else:
            debug_message = "Transcribing {}.".format(name)
            transcription_log.close()

octimot commented 9 months ago

Hey there!

There are already solutions to search through multiple transcriptions at once.

Have you tried Search -> cmd/ctrl click to select multiple transcriptions?

Or, simply Shift + Search to select entire folders (this will also include sub-folders recursively).

Cheers!

hwc0x01 commented 9 months ago

I tried this but couldn't find anything with timestamps which was the critical feature

octimot commented 8 months ago

Could you send a screenshot of the Search window?

You should get something like:

SEARCH > test

Top 5 closest phrases:

Test:
Transcript Group - Transcription of audio.wav 

test Floyd, I've heard a lot about you.
00:00:19.140 - Transcription of audio.wav 

Heywood How do you do?
00:00:13.519 - Transcription of audio.wav 

Are you quite sure?
00:00:36.740 - Transcription of audio.wav 

Would you sit down?
00:00:21.339 - Transcription of audio.wav 

--------------------------------------

Besides the group result above, all results contain the timestamps. Plus, if you use timecoded transcriptions, you will see the timecodes.

octimot / StoryToolkitAI

add central logging text file with timestamps #148