untitled-pit-group / foxhound

PIFS standard backend
BSD Zero Clause License
0 stars 0 forks source link

Indexing: multimedia #8

Open paulsnar opened 2 years ago

paulsnar commented 2 years ago

This is a two-part background job.

First, as with plaintext, download the file from GCS. Then, using ffmpeg, split out the audio track and convert it to a format that GCSTT can accept (presumably Opus with not too high a compression ratio to not waste too much CPU.) Then upload it back to GCS and submit a recognition request.

Afterwards, have a background task repeatedly enqueue itself at a period of, say, 1 minute which checks for progress, and if the transcription is available, downloads it and ingests it into the index, much like with plaintext.

This can fail in more ways; primarily, it needs to be checked that the file actually has an audio track that ffmpeg can recognize. Aside from that, double upload troubles, transcription failure, and a bunch else can happen too, therefore this might turn out to be quite hairy. (Also no idea what happens if GCSTT decides to time out, I'm not sure that can happen but a sanity check of a couple of hours of queueing should also be implemented...)