Open darioalessandro opened 1 year ago
@darioalessandro hey! thank you for releasing this project. I really think it's a great idea. I did spend some time looking at this issue and it essentially boils down to Safari not currently supporting the MediaStreamTrackGenerator/Processor Web APIs.
https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrackGenerator#browser_compatibility
They actually just added support for the WebCodecs API which enables VideoEncoder in version 16.4 so it seems slow going. I had found some examples where they used VideoStreamReader/Writer and those have been deprecated already and aren't included in web_sys anymore.
It seems like we might be able to use a TransformStream, but my knowledge of Web API's is pretty limited...
Thanks for looking into this issue!!
I think it is worth trying @alcolmenar would you like to take on this task?
Thanks for looking into this issue!!
I think it is worth trying @alcolmenar would you like to take on this task?
@darioalessandro Sure I can give it a shot. I might need some assistance though at some point if I get stuck.
@alcolmenar how's it going?
@darioalessandro hey! I've been a bit busy with other things so I haven't had a chance to start in this yet. I should get working on it later today or tomorrow
sgtm
@darioalessandro I've been looking into adding Safari support and I think I've determined a way to do this, but I'm not sure if it is something we want to implement. The main issue here is that Safari doesn't currently support the MediaStreamTrackProcessor/Generator APIs which enables the ability to extract a ReadableStream from a MediaStreamTrack. One way I've seen this done is this:
video
elementvideo
frame on to a canvas
elementImageData
and queue this to a ReadableStream
Here is an example of this done in Javascript. https://github.com/GoogleChromeLabs/webm-wasm/blob/master/demo/live.html
We might be able to then use the same VideoEncoder
created in camera_encoder.rs
. I think one issue is that the canvas doesn't render unless it is in focus.
Also, I've been struggling a bit on how to implement the JsObject containing the methods necessary for the ReadableStream
and I can't find any examples of this. I was thinking maybe we can pair on this if you're up for it?
Yes, this will work 😄 you can read the offscreen image straight into a video_frame:
if let Err(e) = context.draw_image_with_html_video_element_and_dw_and_dh(&video_element, 0.0, 0.0, VIDEO_WIDTH.into(), VIDEO_HEIGHT.into()) {
log!("error", e);
}
// create a JsDict with a timestamp property
let mut video_frame_init = VideoFrameInit::new();
video_frame_init.timestamp(0.0); // TODO: use an actual timestamp
video_frame_init.duration(1.0/30.0);
let video_frame = VideoFrame::new_with_html_image_element_and_video_frame_init(
&html_image_element,
&video_frame_init);
you can modify this method directly: https://github.com/security-union/videocall-rs/blob/a629581b6f3abbb7221ad7c6cdd6b0cd1adce054/yew-ui/src/model/encode/camera_encoder.rs#L168-L202
You will need to enable the following features in the web-sys crate:
"OffscreenCanvas",
"OffscreenCanvasRenderingContext2d",
"ImageData",
"VideoFrameInit"
@darioalessandro thank you for the response. it definitely helped me out. I was able to get camera_encoding working on Safari and was able to test it locally, but I had to remove the use of the MediaStreamTrackGenerator
for Audio decoding. I put a up a Draft PR with the changes for you to check out. The video on the receive side seemed to render with the full height and width though and I couldn't seem to figure out how to adjust that without changing drawImage
height and width. I thought maybe it was just a CSS change but that wasn't the case.
Also, while looking into the Audio side of the equation, it looks like Safari actually doesn't support the AudioEncoder
as well and it looks like we'll need to use a third party package to do this encoding. Also on the decode side, Safari doesn't support MediaStreamTrackGenerator
and AudioDecoder
so we'll have to rethink how to implement this using the Web Audio APIs specifically the stream generation and decoding (most likely 3rd party package again). I tried to find some examples of doing the stream generation. All I could find are examples using ArrayBuffer
, but with my limited Javascript/WASM knowledge I'm not sure this will work. Do you have any suggestions how to go about implementing this?
This is awesome! We use opus for audio encoding so I am sure that we can port it to wasm! This is awesome🏆
Seems like iOS Chrome has the same issue!!
Adding ChatGPT response regarding audio processing, let me know what you think:
if I am receiving the audio chunks via a websocket, how can I then play it using the audio context?
ChatGPT Certainly! You can play audio chunks received via a WebSocket using the AudioContext. Here's a general outline of how you could do it:
Create the AudioContext and ScriptProcessorNode: You'll need to create an AudioContext and a ScriptProcessorNode (or AudioWorklet if available) to handle the incoming chunks.
Set Up the WebSocket: You'll need to set up the WebSocket connection to receive the audio chunks. Make sure you know the audio format and sample rate.
Decode and Play Audio Chunks: As chunks are received, you'll need to decode them (if necessary) and then add them to a buffer that the ScriptProcessorNode can play.
Here's some sample code that outlines this process:
var audioContext = new (window.AudioContext || window.webkitAudioContext)();
var scriptNode = audioContext.createScriptProcessor(4096, 1, 1); // Choose appropriate buffer size
var audioQueue = [];
scriptNode.onaudioprocess = function(event) {
var output = event.outputBuffer.getChannelData(0);
while (audioQueue.length && output.length > audioQueue[0].length) {
var chunk = audioQueue.shift();
output.set(chunk, 0); // copy chunk to output
output = output.slice(chunk.length);
}
if (audioQueue.length > 0) {
output.set(audioQueue[0].slice(0, output.length), 0);
audioQueue[0] = audioQueue[0].slice(output.length);
}
};
scriptNode.connect(audioContext.destination);
var socket = new WebSocket('ws://your-websocket-server');
socket.onmessage = function(event) {
var audioData = new Float32Array(event.data); // Assuming audio data is sent as Float32Array
// You might need to decode the data if it's compressed
audioQueue.push(audioData);
};
socket.onopen = function() {
// Send any necessary commands to start receiving audio
};
Note that this example assumes the audio is being sent in a compatible format (e.g., Float32Array) and doesn't require additional decoding. If the audio is being sent in a compressed format, you'll need to decode it before adding it to the queue.
Additionally, precise synchronization and buffer handling might require careful tuning and consideration of the specific requirements of your application.
This is a basic and illustrative example. For a production environment, more refined and error-tolerant code would be necessary. Make sure to handle different WebSocket events properly and control the audio buffer efficiently to avoid glitches or interruptions in the playback.
Seems like iOS Chrome has the same issue!!
Ah that makes sense. From my understanding, iOS chrome uses the same engine as Safari
Damn! So chrome just uses the iOS WebView, that is bad :(
@darioalessandro it appears that using a processor or a worklet in the Audio graph is the way to go here outside of using WebRTC. In the worklet, we'll need to use a 3rd party opus encoder / decoder. I was thinking about using this: https://github.com/mmig/opus-encdec
There don't seem to appear to be any Rust implementations of opus.
Also, this'll probably take me a bit of time to figure out. I'm pretty far out of my wheel house here
This helps a lot! Do you think we should fallback to webrtc for old browsers? We could terminate the webrtc connections on the backend and transform the streams to the other protobuf protocol
I think there'd be better support for it since many of the video conferencing platforms use it. However, I do think that if we use WebRTC for part of the app, it'd make sense to use it for all of media transfers. But that'd be a large change and possibly not in the spirit of this app. There is an idea of an SFU (Selective Forwarding Unit) in the WebRTC world which is similar to what this app is doing but instead using web sockets and protobufs.
It can work using the solution above though and imo probably easier?
I think there'd be better support for it since many of the video conferencing platforms use it. However, I do think that if we use WebRTC for part of the app, it'd make sense to use it for all of media transfers. But that'd be a large change and possibly not in the spirit of this app. There is an idea of an SFU (Selective Forwarding Unit) in the WebRTC world which is similar to what this app is doing but instead using web sockets and protobufs.
It can work using the solution above though and imo probably easier?
Go with using a processor or a worklet in the Audio graph is the way to go here outside of using WebRTC. In the worklet, we'll need to use a 3rd party opus encoder / decoder. I was thinking about using this: https://github.com/mmig/opus-encdec
@alcolmenar are you pursuing this?
hey ya still working on it. I should have something for the encode side soon
According to MDN and caniuse.com, MediaStreamTrackProcessor
should be available in Safari 18 which was released a couple weeks ago. However, I am still getting the same error when I try to call the MediaStreamTrackProcessor
constructor.
Seems like Safari is missing some classes that our App relies on, we need to find a way around it or at least show an error: