Closed eliteprox closed 2 months ago
Added error handlers to respond with "400 bad request" when duration cannot be calculated due to unsupported file format or file corruption. This prevents invalid jobs from being sent to the network.
Overall LGTM. I would recommend using ffmpeg for audio track processing (like calculating durations) and only accepting audio input instead of videos -- ffmpeg can help with that as well and we could potentially commonize the Probe
package I linked in catalyst-api in the comments. Also, we need to understand what the file length limits are like.
@eliteprox looks like our pipeline fails to detect the duration of the following file:
Do you maybe know why 🤔?
@eliteprox looks like our pipeline fails to detect the duration of the following file:
Do you maybe know why 🤔?
This one appears to be a concatenated file and ffmpeg has issues calculating the duration in this case. The recommended solution is to re-encode the input to a consistent output format like flac. This can be combined with the effort to send audio-only to the ai-worker to optimize the pipeline.
What does this pull request do? Explain your changes. (required)
Adds the new
/audio-to-text
pipeline to go-livepeer, supporting theopenai/whisper-large-v3
model.File formats supported are mp3, m4a, mp4, webm, and flac
This change requires https://github.com/livepeer/ai-worker/pull/103 and https://github.com/livepeer/lpms/pull/407
Specific updates (required)
handleAIRequest
andprocessAIRequest
to support new response types like TextResponse/audio-to-text
endpoint toai_mediaserver.go
ffmpeg.GetCodecInfo
to calculate duration and requires the lpms pull request aboveHow did you test each of these updates (required)
curl request example:
Does this pull request close any open issues?
LIV-429 LIV-289
Checklist:
make
runs successfully./test.sh
pass