nextcloud / spreed

🗨️ Nextcloud Talk – chat, video & audio calls for Nextcloud
https://nextcloud.com/talk
GNU Affero General Public License v3.0
1.6k stars 428 forks source link

Migrate to the task processing API #13115

Open julien-nc opened 2 weeks ago

julien-nc commented 2 weeks ago

Translation and SpeechToText backend APIs are deprecated. Those features are now included in the task processing API (since 30).

The old APIs will stay a few more NC major version. The old SpeechToText API can now use the new providers (for TaskProcessing) so there is no rush to migrate.

The providers for the Translate API and the translation providers for the TaskProcessing API can be installed side by side so there is no rush to migrate there either.

Translation

You can use the assistant's UI to run translation tasks in the UI. If the assistant app is enabled, the OCA.Assistant.openAssistantForm function should be available.

if (OCA.Assistant.openAssistantForm) {
    OCA.Assistant.openAssistantForm({
        appId: 'spreed',
        customId: 'message translation',
        taskType: 'core:text2text:translate',
        inputs: {
            input: 'the content of the message',
        },
        closeOnResult: false,
    }).then(task => {
        if (task.status === 'STATUS_SUCCESSFUL') {
            console.debug('assistant result task output', task.output.output)
        } else {
            console.debug('assistant result task', task)
        }
    })
}

The promise will resolve if the task succeeds, fails or is scheduled for later by the user. The promise result is the task object. The closeOnResult parameter of OCA.Assistant.openAssistantForm decides if the assistant is closed when the task succeeds of fails. It can be false to stay close to the current behaviour of the translate modal in Talk. The user sees the result in the assistant and there is a "copy" button. The user can then close the assistant modal.

SpeechToText

Transcription can be done with the core:audio2text task type of the taskProcessing API. More details on how to run such task in the backend in the Transcribe section of https://github.com/nextcloud/assistant/issues/114

nickvergessen commented 2 weeks ago

@julien-nc we have a bit of a problem here.

Translating chat messages

We need OCS APIs as our mobile and desktop clients are calling it, and they should "respond" with it and not be delegated to a background job (No one will wait 5 minutes on the translation of a chat message).

Transcription of call recordings

Can be done in a background job, should be fine (we do that now as well as far as I know)

nickvergessen commented 2 weeks ago

Also the API endpoints in https://github.com/nextcloud/server/blob/bc5c0262af3cd375620d6534353a3842149ad6ab/core/Controller/TranslationApiController.php are not marked as @deprecated

julien-nc commented 2 weeks ago

No one will wait 5 minutes on the translation of a chat message

If the provider is an exApp, it will process tasks ASAP, no delay. If the provider is a Php app and occ background-job:worker "OC\TaskProcessing\SynchronousBackgroundJob" is running, no delay either.

Once the task is scheduled, the clients can poll it with ocs/v2.php/taskprocessing/task/TASK_ID. That's what the assistant does in the frontend. No more blocking request as it could be too long and be killed but also it blocks a Php runner while waiting.

nickvergessen commented 2 weeks ago

So instead of getting a string returned the clients shall DOS the server. The feature still breaks for existing clients.

julien-nc commented 2 weeks ago

We can also keep the providers for the old APIs in integration_openai and the features in Talk are not broken.

nickvergessen commented 2 weeks ago

I will check with Andy next week what to do.

julien-nc commented 1 week ago

Two things should make it more convenient:

All this is in stable30 already.

The providers for the Translate API and the TaskProcessing API are implemented in different apps so you can keep using the Translate API as long as you want.