sul-dlss speech-to-text issues

sul-dlss / speech-to-text

Tools for generating transcript and caption files from media files (e.g. a Docker container for running Whisper on video files in AWS ECS? 🤷🏽)

0 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

File specific language specification

#51 edsu opened 1 day ago
0
Check media and log attributes

#50 edsu closed 17 hours ago
1
speech_to_text.py should check to see whether a file contains audio before processing it with whisper

#48 jmartin-sul closed 17 hours ago
0
Whisper output quality: test transcription of video where most of speech is German, but opening 30ish seconds of speech is English

#47 jmartin-sul closed 1 week ago
2
Build and deploy speech-to-text

#46 edsu opened 1 week ago
0
When using Whisper's auto-detected language, insert that language into the Cocina

#45 andrewjbtw closed 5 days ago
2
[HOLD] only need vtt and txt files in output

#42 peetucket closed 1 week ago
3
Whisper should only produce .txt and .vtt files

#41 peetucket closed 1 week ago
1
Add type checking

#40 edsu closed 3 weeks ago
0
Mocked AWS & Github Action

#39 edsu closed 4 weeks ago
0
Stanza API docs review

#38 laurensorensen opened 1 month ago
5
API docs overview -- how does the file return to us via the Amara API?

#37 laurensorensen closed 1 month ago
1
finish up CI configuration

#36 jmartin-sul closed 3 weeks ago
0
Investigate Whisper.writer parameters

#35 alundgard opened 1 month ago
1
Simplify bucket and job message

#34 edsu closed 1 month ago
0
Adjust locations for AWS Whisper Container

#33 peetucket closed 1 month ago
1
write DevOpsDocs for speech-to-text infrastructure

#32 jmartin-sul opened 1 month ago
0
Log media size and duration

#31 edsu closed 17 hours ago
0
Added logging and removed caching

#30 edsu closed 1 month ago
0
Improve logging

#29 edsu closed 1 month ago
0
ExpiredToken when calling the ReceiveMessage

#28 edsu closed 1 month ago
0
Add Honeybadger

#27 edsu opened 1 month ago
0
Add new Turbo model

#26 edsu closed 1 month ago
0
minor enhancements: return technical metadata, allow job ID specification for testing

#25 jmartin-sul closed 1 month ago
4
speech-to-text worker sends back some basic technical metadata in the body of the done message it queues

#24 jmartin-sul closed 1 month ago
0
should we automatically update the model files that whisper uses? if so, at what frequency and with what mechanism?

#23 jmartin-sul opened 1 month ago
2
small readme and comment touchups

#22 jmartin-sul closed 1 month ago
1
DONE message should include output file

#21 edsu closed 3 weeks ago
1
TODO job should just include ID

#20 edsu closed 1 month ago
1
Run tests as Github Action

#12 edsu closed 4 weeks ago
0
Add initial Docker container

#9 edsu closed 1 month ago
0
Finish skeleton common-accessioning robot and workflow def for... `captionWF`? `speechToTextWF`? [final name TBD]

#8 jmartin-sul closed 2 months ago
1
Productionize speech_to_text_generation_service, or get it as close as possible and ticket the remaining work

#7 jmartin-sul opened 2 months ago
0
provision S3 buckets and credentials for temp space to hold 1) the incoming content to run through speech-to-text generation, and 2) the generated transcript/caption files.

#6 jmartin-sul closed 2 months ago
3
Questions surrounding speech_to_text_generation_service (e.g., do we need a speech_to_text_request_service REST API?)

#5 jmartin-sul closed 2 months ago
1
[investigate/prototype] speech_to_text_generation_service approach 2: Explore AWS SageMaker

#4 jmartin-sul opened 2 months ago
2
[investigate/prototype] speech_to_text_generation_service approach 1: Define a Docker container for running open source Whisper in a container that we define and for which we manage deployment (lives in this repo?)

#3 jmartin-sul closed 1 month ago
2
Choose an approach for producing speech-to-text output, given media file input (let's call this speech_to_text_generation_service for now?)

#2 jmartin-sul opened 2 months ago
0
[EPIC] Prototype workflow for generating and accessioning speech-to-text extraction

#1 jmartin-sul opened 2 months ago
0