k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
515 stars 103 forks source link

EOS Token #381

Closed KazBrekker1 closed 11 months ago

KazBrekker1 commented 1 year ago

Could a token be sent to trigger the VAD manually? Currently, if you send "Done" the connection terminates.

So maybe if "EOS" is sent, it will trigger the endpoint detection.

csukuangfj commented 1 year ago

So maybe if "EOS" is sent, it will trigger the endpoint detection.

Endpoitning detection is performed at the server side according to the decoding results.

If you want to learn more about endpointing in sherpa, please read the doc https://k2-fsa.github.io/sherpa/python/streaming_asr/endpointing.html


Currently, if you send "Done" the connection terminates.

Done means the client has sent all the data so it will terminate the connection.

KazBrekker1 commented 1 year ago

Sorry if the request wasn't explicit. I'm aware that the endpoint detection is done on the server side automatically according to the arguments passed.

But is it possible to add a manual trigger that activates the endpointing by sending a token to the server? Similar to how the 'Done' Token is passed to signify connection termination.

e.g.

message = await socket.recv()
is_final = stream.endpoint_detected(self.online_endpoint_config) or message == "EOS"
csukuangfj commented 1 year ago

What should the server do if an EOS is received?

Is there anything else the server can do in addition to setting is_final?

And when should the client send an EOS?

KazBrekker1 commented 11 months ago

1, 2: what should the server do if an EOS is received and additional behaviors: It should force the current segment [eg. segment 0] to finish and send with "is_final" as true, and start returning predictions with the following segments [eg. segment 1].

3: when does the client send an EOS: when an action on the client side happens that signifies a [pause in speech / temporary end of speech] in the same session without ending the websocket connection

csukuangfj commented 11 months ago

1, 2: what should the server do if an EOS is received and additional behaviors: It should force the current segment [eg. segment 0] to finish and send with "is_final" as true, and start returning predictions with the following segments [eg. segment 1].

3: when does the client send an EOS: when an action on the client side happens that signifies a [pause in speech / temporary end of speech] in the same session without ending the websocket connection

@KazBrekker1

Thanks for the detailed description. Now I understand your requirement.
I think that is doable on the server side.

But I am not sure how general this requirement is. Maybe you can implement the requirement in your own fork.

KazBrekker1 commented 11 months ago

sure thing, Thank you for the reply.

will be closing the issue, if i implement it i will put the link here if someone had the same requirement.