Open tomsmith8 opened 6 months ago
Hi @tomsmith8 I would like to help out with this!
The first step is to access media streams from the Jitsi Call. This would involve tapping into the WebRTC API to access video, audio, and screen recording streams. We'll ensure each is identified correctly and is accessible.
Capturing video frames can be accomplished via a canvas context. We'd draw the current video frame onto an HTML canvas object, then use the getImageData method periodically to extract frames for real-time processing. A buffer procedure would be in place to handle all this smoothly without interfering with the active call.
We'll leverage the Web Audio API to capture audio. We can use the ScriptProcessorNode (or AudioWorklet for more modern contexts) to process audio samples in real-time. These audio packets can then be stored in a buffer ready for speech recognition.
@JZ1999 Any update on providing a documented solution with Jitsi/Jibri/webRCT for real-time streaming?
@tomsmith8 can I work on this? My sphinx username is asterisk32 https://community.sphinx.chat/p/cmv6tnqtu2rk819pr5mg/assigned
@hkarani sure - we're looking for a provided solution for the bounty. Once we have a provided solution we're happy with we'll look to break the solution out into further bounties (implementation)
Description
Provide us with a how to solution to extract periodic screen frames and audio for speech recognition in real-time from a Jitsi WebRTC call. The extracted frames and audio would be processed for further analysis via another API.
Objectives
Suggested Tasks to be reviewed for anything missing:
Access Media Streams
Capture Video Frames
Capture and Process Audio
Ensure the audio data is correctly buffered and ready for speech recognition.
Provide a detailed explanation of the implementation process above on whether its the correct approach. Please provide alternative or additional notes on how to process audio, video and screen recording in real-time
Acceptance Criteria