if speech-to-text generation succeeds, fetch speech-to-text output

jmartin-sul commented 1 month ago

blocked by #1358 possibly blocked by https://github.com/sul-dlss/speech-to-text/issues/21

question: is this a separate operation of its own, or is this part of https://github.com/sul-dlss/common-accessioning/issues/1358? or maybe part of #1360? marking this blocked till we have enough progress on #1358 to decide whether this needs to be a separate ticket.

either way, is it its own workflow step?

part of https://github.com/sul-dlss/common-accessioning/issues/1363

jmartin-sul commented 1 month ago

max SQS message size might be 256KB, which probably means we don't want to put the output in the success message

peetucket commented 1 month ago

is this the same thing as #1360 or am I mis-understanding?

jmartin-sul commented 1 month ago

is this the same thing as #1360 or am I mis-understanding?

hmm, looking at this again for the first time in a few weeks, i think that was implicitly a question (since it was unclear whether this was part of #1358, and since #1360 seems like the other ticket it could be a part of, if it's not a standalone issue).

i updated the description to be a little clearer about that question. i'd be fine rolling this into #1360, since of course the output needs to be fetched before it's staged. also fine keeping this as a separate ticket that blocks #1360. updated #1360 accordingly also.

thinking about the other question in the description above, which also came up in planning today, whether this is its own workflow step... it seems like we have 3 options:

make this part of stt-create. pros: that step gets the SQS notification, and if we add the output file list to the SQS done queue message that whisper creates on success/failure, we can just use that file list to fetch the files, without sticking the list in a workflow value, or putting bucket listing logic in another workflow step with assumptions about file layout. cons: if it's the file retrieval that fails, then retrying the workflow step will re-run whisper on all input files, unless we build into the whisper worker code some intelligence about looking for existing output and skipping processing for corresponding input. which might be beneficial anyway, but this approach either implies somewhat more complex code, or somewhat more resource usage, when accounting for inevitable network flakiness.
make this part of stage-files: pros: natural starting point for the step since it's an obvious pre-req for staging the files, doesn't introduce inefficiency if retrieval fails because the first part of the workflow setp is what'd be getting retried. staging is unlikely to fail in a way that's retriable without manual intervention (i think?), and the STT output files are small, so there's little waste incurred by the fact that retrying this step always means re-fetching the output, because that's a pretty cheap operation and it's unlikely a subsequent part of the step will fail in an easily retriable way. consistent with the way we do things in ocrWF. cons: either need to have an implicit heuristic for what files to retrieve from the bucket based on druid-version of the workflow step instance, or need to pass whisper returned file list along in workflow metadata.
make this its own workflow step: pros: makes it clearer what's happening. better separates operations that are independently retriable from one another. cons: new workflow step, and would also nudge us toward renaming the existing upstream step from fetch-files to e.g. fetch-files-from-preservation. differs from ocrWF, which can lead to slightly more confusion and maintenance difficulty. same con about figuring out files to retrieve

having written all that out explicitly now... i think i'd vote for doing the file fetch in stage-files.

but, i think i'd also vote for keeping this a separate ticket still blocked for now, the purview of which is:

once https://github.com/sul-dlss/speech-to-text/issues/21 is finished, modify the watcher daemon from #1388 to stick the output file list (parsed from the "done" SQS message) in workflow metadata. we could also just list a known bucket prefix (e.g. druid-version/out/). but since whisper knows what it generated, i sort of like the idea of using that file list as extra confirmation that everything we expect is there for retrieval, unless implementing that turns out to be noticeably more onerous than some implicit file listing logic.
use the file list in workflow metadata to retrieve the expected STT output, as the first operation in stage-files.

and then once that's done, #1360 is unblocked.

if i'm outvoted and the consensus is that we should make STT output fetching part of the watcher daemon that markes stt-create as completed, i'd vote for getting #1388 fixed up and merged with its current set of functionality, and then doing whatever the work is for this ticket and/or #1360 as a follow up PR.

peetucket commented 4 weeks ago

Thanks for the comprehensive explanation. I would vote for doing it in the existing (but currently unimplemented) stage-files step for the reasons you note above. I was assuming we would just fetch every file in the "output" folder for that druid, which would obviate the need to pass any information into the step. This would (sort of) match what is done for the same step in ocrWF. We do use some logic there to decide which files to copy over so it's not necessarily everything that came out of ABBYY, and maybe we could also do some logic for speech to text if needed. It is nice to make the steps re-doable more atomically if needed.

jmartin-sul commented 3 weeks ago

closed by https://github.com/sul-dlss/common-accessioning/issues/1360 (implemented in stage-files, just grabs everything in output)

sul-dlss / common-accessioning

if speech-to-text generation succeeds, fetch speech-to-text output #1359