Closed Neurrone closed 11 months ago
Hi there,
Can you provide use cases for this (such as parameters you need and how this functionality can benefit other developers)? Thanks.
Achieving synchronization of speech with something else, for example, sound. A major category of applications that needs this are audiogames. For example, during turn-based combat, you would want to synchronize the announcements of actions taken by units with sound playback. There are probably other use cases though.
Actually NVDA Japanese team received this kind of request from application developers. The major reason I have heard is that: the commercial screen reader provides such API for third party developers. We did experimental work of "isSpeaking" API in NVDA Japanese version. Following is the document regarding our work: https://osdn.jp/projects/nvdajp/wiki/ControllerClientEnhancement
However, I am not satisfied with our work so far.
Firstly, it is difficult to cover various speech synthesizers to support the API. We should use methods, properties or callbacks to meet the demand, and the implementation should be done for each synthesis APIs. I think audio ducking work can be a good chance to find better way of improve our isSpeaking API.
Secondly, such API does not work correctly for some cases. For example, in some cases, our isSpeaking API returns wrong value right after the speakText is performed. Most TTS engine does not respond to the speakText command immediately and the state transitions should be 'preparing,' 'speaking,' and 'idle'. This kind of API should be designed very carefully.
The last point is that, with my observation, NVDA developers prefer 'universal solutions' rather than enhancing NVDA-dependent ControlClient API. Some of the users of ControlClient API is neglecting to consider other solutions, such as to use relevant accessibility API at the application side, or provide appModule for the NVDA side.
Anyway, I am happy to discuss the issue here.
This isn't currently possible. NVDA core does not actually have this information at the moment. #4877 might provide the basis for this. Even if it does, doing this would be extremely low priority for us, but that doesn't stop someone else from taking it up.
Is there anyone still considering this? I can imagine that it would help alot in implementing Sound and speech based Support for diagrams.
Probably not. This is non-trivial.
@Neurrone could you please fill the feature template for this one? Please include use cases and how you imagine this feature to work and which alternatives you think of. As of now, I will close this and hope that you address Joseph's comment above in the feature template. Thanks.
@Adriani90 Have you looked at the entire conversation, or just at the first two commends? The issue is valid, and even no one expressed interest in fixing it at the moment there is no reason to close it in my opinion. It isn't created according to the feature request template, but the commends above explained why it is needed for some.
Oh sorry my browser didn't display all comment for what ever reason. Reopenning.
Some guidance on this in https://github.com/nvaccess/nvda/wiki/Speech-Refactor-Use-Case-Notes
It should be pretty trivial to add a function to do this. Checking speech._manager._pendingSequences
, if it's empty, there is no speech in progress.
This feature would be very useful in my case. I'm making a service that parses a game screen and outputs a description of where the player is and what's around them (for example: west wall, north town). This is done continuously, so the service needs to know if NVDA has finished speaking what was sent to it before sending the updated game screen.
this feature would be really useful
I'd like feedback from people on how they intend to use this. The simplest mechanism would be adding a function such as bool nvdaController_isSpeaking
but this will require client software to poll while waiting for speech to finish.
A more complicated addition might be to allow registration of a callback to notify that speech is finished:
void nvdaController_onFinishedSpeaking(void (*onFinishedSpeakingCallback)(void))
Is there a desire to be notified, or be able to poll for some arbitrary "amount" of speech remaining? It isn't clear how we would define an "amount" of speech to external software. I'd prefer not going down this path without strong justification.
I'm currently writing a prototype to provide speech to NVDA Controller using SSML. It allows the function to execute in blocking mode, which basically means that the function blocks until speech is done. As SSML supports marks in the provided sequence, you can also register a callback that is called for every mark in the speech. @Neurrone Would that cover your case?
Yeah, that would be more than sufficient.
See title.