security-union / videocall-rs

teleconference system written in rust
https://www.videocall.rs
MIT License
1.41k stars 124 forks source link

Support Safari 16.5 and iOS Chrome #58

Open darioalessandro opened 1 year ago

darioalessandro commented 1 year ago

Seems like Safari is missing some classes that our App relies on, we need to find a way around it or at least show an error:

Screenshot 2023-06-11 at 6 02 42 PM

alcolmenar commented 1 year ago

@darioalessandro hey! thank you for releasing this project. I really think it's a great idea. I did spend some time looking at this issue and it essentially boils down to Safari not currently supporting the MediaStreamTrackGenerator/Processor Web APIs.

https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrackGenerator#browser_compatibility

They actually just added support for the WebCodecs API which enables VideoEncoder in version 16.4 so it seems slow going. I had found some examples where they used VideoStreamReader/Writer and those have been deprecated already and aren't included in web_sys anymore.

It seems like we might be able to use a TransformStream, but my knowledge of Web API's is pretty limited...

darioalessandro commented 1 year ago

Thanks for looking into this issue!!

I think it is worth trying @alcolmenar would you like to take on this task?

alcolmenar commented 1 year ago

Thanks for looking into this issue!!

I think it is worth trying @alcolmenar would you like to take on this task?

@darioalessandro Sure I can give it a shot. I might need some assistance though at some point if I get stuck.

darioalessandro commented 1 year ago

@alcolmenar how's it going?

alcolmenar commented 1 year ago

@darioalessandro hey! I've been a bit busy with other things so I haven't had a chance to start in this yet. I should get working on it later today or tomorrow

darioalessandro commented 1 year ago

sgtm

alcolmenar commented 1 year ago

@darioalessandro I've been looking into adding Safari support and I think I've determined a way to do this, but I'm not sure if it is something we want to implement. The main issue here is that Safari doesn't currently support the MediaStreamTrackProcessor/Generator APIs which enables the ability to extract a ReadableStream from a MediaStreamTrack. One way I've seen this done is this:

  1. Play MediaStream on a video element
  2. Draw video frame on to a canvas element
  3. Get the canvas ImageData and queue this to a ReadableStream

Here is an example of this done in Javascript. https://github.com/GoogleChromeLabs/webm-wasm/blob/master/demo/live.html

We might be able to then use the same VideoEncoder created in camera_encoder.rs. I think one issue is that the canvas doesn't render unless it is in focus.

Also, I've been struggling a bit on how to implement the JsObject containing the methods necessary for the ReadableStream and I can't find any examples of this. I was thinking maybe we can pair on this if you're up for it?

darioalessandro commented 1 year ago

Yes, this will work 😄 you can read the offscreen image straight into a video_frame:

if let Err(e) = context.draw_image_with_html_video_element_and_dw_and_dh(&video_element, 0.0, 0.0, VIDEO_WIDTH.into(), VIDEO_HEIGHT.into()) {
                        log!("error", e);
                    }

// create a JsDict with a timestamp property
let mut video_frame_init = VideoFrameInit::new();
video_frame_init.timestamp(0.0); // TODO: use an actual timestamp
video_frame_init.duration(1.0/30.0);

 let video_frame = VideoFrame::new_with_html_image_element_and_video_frame_init(
&html_image_element, 
 &video_frame_init);

you can modify this method directly: https://github.com/security-union/videocall-rs/blob/a629581b6f3abbb7221ad7c6cdd6b0cd1adce054/yew-ui/src/model/encode/camera_encoder.rs#L168-L202

You will need to enable the following features in the web-sys crate:

   "OffscreenCanvas",
    "OffscreenCanvasRenderingContext2d",
    "ImageData",
    "VideoFrameInit"
alcolmenar commented 1 year ago

@darioalessandro thank you for the response. it definitely helped me out. I was able to get camera_encoding working on Safari and was able to test it locally, but I had to remove the use of the MediaStreamTrackGenerator for Audio decoding. I put a up a Draft PR with the changes for you to check out. The video on the receive side seemed to render with the full height and width though and I couldn't seem to figure out how to adjust that without changing drawImage height and width. I thought maybe it was just a CSS change but that wasn't the case.

Also, while looking into the Audio side of the equation, it looks like Safari actually doesn't support the AudioEncoder as well and it looks like we'll need to use a third party package to do this encoding. Also on the decode side, Safari doesn't support MediaStreamTrackGenerator and AudioDecoder so we'll have to rethink how to implement this using the Web Audio APIs specifically the stream generation and decoding (most likely 3rd party package again). I tried to find some examples of doing the stream generation. All I could find are examples using ArrayBuffer, but with my limited Javascript/WASM knowledge I'm not sure this will work. Do you have any suggestions how to go about implementing this?

darioalessandro commented 1 year ago

This is awesome! We use opus for audio encoding so I am sure that we can port it to wasm! This is awesome🏆

darioalessandro commented 1 year ago

Seems like iOS Chrome has the same issue!!

IMG_7468691FB502-1

darioalessandro commented 1 year ago

Adding ChatGPT response regarding audio processing, let me know what you think:

if I am receiving the audio chunks via a websocket, how can I then play it using the audio context?

ChatGPT Certainly! You can play audio chunks received via a WebSocket using the AudioContext. Here's a general outline of how you could do it:

Create the AudioContext and ScriptProcessorNode: You'll need to create an AudioContext and a ScriptProcessorNode (or AudioWorklet if available) to handle the incoming chunks.

Set Up the WebSocket: You'll need to set up the WebSocket connection to receive the audio chunks. Make sure you know the audio format and sample rate.

Decode and Play Audio Chunks: As chunks are received, you'll need to decode them (if necessary) and then add them to a buffer that the ScriptProcessorNode can play.

Here's some sample code that outlines this process:

var audioContext = new (window.AudioContext || window.webkitAudioContext)();
var scriptNode = audioContext.createScriptProcessor(4096, 1, 1); // Choose appropriate buffer size

var audioQueue = [];

scriptNode.onaudioprocess = function(event) {
  var output = event.outputBuffer.getChannelData(0);
  while (audioQueue.length && output.length > audioQueue[0].length) {
    var chunk = audioQueue.shift();
    output.set(chunk, 0); // copy chunk to output
    output = output.slice(chunk.length);
  }
  if (audioQueue.length > 0) {
    output.set(audioQueue[0].slice(0, output.length), 0);
    audioQueue[0] = audioQueue[0].slice(output.length);
  }
};

scriptNode.connect(audioContext.destination);

var socket = new WebSocket('ws://your-websocket-server');

socket.onmessage = function(event) {
  var audioData = new Float32Array(event.data); // Assuming audio data is sent as Float32Array
  // You might need to decode the data if it's compressed
  audioQueue.push(audioData);
};

socket.onopen = function() {
  // Send any necessary commands to start receiving audio
};

Note that this example assumes the audio is being sent in a compatible format (e.g., Float32Array) and doesn't require additional decoding. If the audio is being sent in a compressed format, you'll need to decode it before adding it to the queue.

Additionally, precise synchronization and buffer handling might require careful tuning and consideration of the specific requirements of your application.

This is a basic and illustrative example. For a production environment, more refined and error-tolerant code would be necessary. Make sure to handle different WebSocket events properly and control the audio buffer efficiently to avoid glitches or interruptions in the playback.

alcolmenar commented 1 year ago

Seems like iOS Chrome has the same issue!!

Ah that makes sense. From my understanding, iOS chrome uses the same engine as Safari

darioalessandro commented 1 year ago

Damn! So chrome just uses the iOS WebView, that is bad :(

alcolmenar commented 1 year ago

@darioalessandro it appears that using a processor or a worklet in the Audio graph is the way to go here outside of using WebRTC. In the worklet, we'll need to use a 3rd party opus encoder / decoder. I was thinking about using this: https://github.com/mmig/opus-encdec

There don't seem to appear to be any Rust implementations of opus.

Also, this'll probably take me a bit of time to figure out. I'm pretty far out of my wheel house here

darioalessandro commented 1 year ago

This helps a lot! Do you think we should fallback to webrtc for old browsers? We could terminate the webrtc connections on the backend and transform the streams to the other protobuf protocol

alcolmenar commented 1 year ago

I think there'd be better support for it since many of the video conferencing platforms use it. However, I do think that if we use WebRTC for part of the app, it'd make sense to use it for all of media transfers. But that'd be a large change and possibly not in the spirit of this app. There is an idea of an SFU (Selective Forwarding Unit) in the WebRTC world which is similar to what this app is doing but instead using web sockets and protobufs.

It can work using the solution above though and imo probably easier?

darioalessandro commented 1 year ago

I think there'd be better support for it since many of the video conferencing platforms use it. However, I do think that if we use WebRTC for part of the app, it'd make sense to use it for all of media transfers. But that'd be a large change and possibly not in the spirit of this app. There is an idea of an SFU (Selective Forwarding Unit) in the WebRTC world which is similar to what this app is doing but instead using web sockets and protobufs.

It can work using the solution above though and imo probably easier?

Go with using a processor or a worklet in the Audio graph is the way to go here outside of using WebRTC. In the worklet, we'll need to use a 3rd party opus encoder / decoder. I was thinking about using this: https://github.com/mmig/opus-encdec

darioalessandro commented 1 year ago

@alcolmenar are you pursuing this?

alcolmenar commented 1 year ago

hey ya still working on it. I should have something for the encode side soon

allsey87 commented 1 month ago

According to MDN and caniuse.com, MediaStreamTrackProcessor should be available in Safari 18 which was released a couple weeks ago. However, I am still getting the same error when I try to call the MediaStreamTrackProcessor constructor.