Get Raw Audio from reat-native-webrtc tracks.

sharjeelBokhari1 commented 5 months ago

Expected Behavior:

I am making an Audio Only meeting using firebase as my signalling server, I want to use Python Flask APIs to send the acquired Raw Audio Data (the audio arrays) before sending the audio to the other Peer to apply certain ML models to change the audio data e.g censoring and then send that processed audio over the webRTC peerConnection, in other words, I want the edited audio to be heard by the other peer.

Observed Behavior:

Whenever I try to get the tracks from the MediaStream object or the getUserMedia, whether its from the pc.ontrack event handler or pc.getAudioTracks(), I always get data like properties {remote: false/true, id: "ksdjvnlvsdj3rfsvf32fsdv", kind: "audio" ...} stuff like that (this is not the acuurate data i get just an example. I just can not get my hands on the Audio data. I also found out now that onaddtrack event handler does not work anymore as well. I have tried to use it still but it didn't work you can see in the code given below.

I tried using webRTC Encoded Transform as well but it says that it is not supported when i try to run it on my mobile application.

What I want from you guys:

A way to get the raw audio data in my react native (javascript) android application and send it to the other peer in connection after editing it using Python ML model.

Code for my CallScreen.js:


import React, { useEffect, useState } from "react";
import { View, Text } from "react-native";
import { SafeAreaView } from "react-native-safe-area-context";
import { 
    RTCPeerConnection, 
    RTCView,
    mediaDevices, 
    RTCIceCandidate,
    RTCSessionDescription,
    MediaStream,
    MediaStreamTrack
} from "react-native-webrtc";

import { 
    db, 
    addDoc, 
    collection, 
    doc, 
    setDoc, 
    getDoc, 
    updateDoc, 
    onSnapshot, 
    deleteField 
} from "../firebase/index";
import CallActionBox from "./CallActionBox";

const configuration = {
    iceServers: [
      {
        urls: ["stun:stun1.l.google.com:19302", "stun:stun2.l.google.com:19302"],
      },
    ],
    iceCandidatePoolSize: 10,
  };

const CallScreen = ({ roomId, screens, setScreen }) => {
    const [audioArray, setAudioArray] = useState([]);
    const [localStream, setLocalStream] = useState();
    const [remoteStream, setRemoteStream] = useState();
    const [cachedLocalPC, setCachedLocalPC] = useState();

    const [isMuted, setIsMuted] = useState(false);

    useEffect(() => {
        startLocalStream();
    },[])

    useEffect(() => {
        if (localStream && roomId) {
          startCall(roomId);
        }
    }, [localStream, roomId]);

    async function endCall() {
        if (cachedLocalPC) {
          const senders = cachedLocalPC.getSenders();
          senders.forEach((sender) => {
            cachedLocalPC.removeTrack(sender);
          });
          cachedLocalPC.close();
        }

        const roomRef = doc(db, "room", roomId);
        await updateDoc(roomRef, { answer: deleteField() });

        setLocalStream();
        setRemoteStream(); // set remoteStream to null or empty when callee leaves the call
        setCachedLocalPC();
        // cleanup
        setScreen(screens.ROOM); //go back to room screen
    }

  const startLocalStream = async () => {    
      const constraints = {
        audio: {
          "echoCancellation": true,
          "sampleSize": 16,
          "channelCount": 2,
        },
        video: false,
      };
      const newStream = await mediaDevices.getUserMedia(constraints);
      setLocalStream(newStream);
  };

  const startCall = async (id) => {
      const localPC = new RTCPeerConnection(configuration);
      localStream.getTracks().forEach((track) => {
        track._peerConnectionId = 0;
        const arr = new Uint16Array(track);
        console.log("\n\n\nARRAY TEST\n\n\n",arr);
        localPC.addTrack(track, localStream);
      }
      )
      localPC.onaddtrack = (e) => {
        console.log("\n\n\nStream on Add stream:\n",e.stream,"\n\n\n");
      }

      const dataChannel = localPC.createDataChannel("audio");
      dataChannel.onopen = (e) => {
        // dataChannel.send(e);
        console.log("\n\n\nSenders:",localPC.getSenders(), "\n\n\n");
      }

      dataChannel.onmessage = (e) => {
        console.log("\n\n\n",e.data,"\n\n\n");
      }

      console.log("\n\n\n LocalStream GEtAUdioTRracks \n\n\n", localStream.getAudioTracks());

      localPC.onaddtrack = (e) => {
        console.log("\n\n\nStream on Add stream:\n",e.stream,"\n\n\n");
      }

      const roomRef = doc(db, "room", id);
      const callerCandidatesCollection = collection(roomRef, "callerCandidates");
      const calleeCandidatesCollection = collection(roomRef, "calleeCandidates");

      localPC.addEventListener("icecandidate", (e) => {
          if (!e.candidate) {
              console.log("Got final Candidate");
              return;
          }
          addDoc(callerCandidatesCollection, e.candidate.toJSON());
      });

      localPC.ontrack = (e) => {
          const newStream = new MediaStream();
          console.log("\n\n\n newStream._tracks \n\n\n",e.streams[0], "\n\n\n");
          e.streams[0].getTracks().forEach((track) => {
            newStream.addTrack(track);
            console.log("\n\n\nSenders:",localPC.getSenders(), "\n\n\n");
            console.log("\n\n\n Track \n\n\n",track);
          });
          console.log("\n\n\nStream\n\n\n", newStream);
          console.log("\n\n\n CODECS \n\n\n", e.streams[0].getAudioTracks()[0]);
          setRemoteStream(newStream);
      }

      const offer = await localPC.createOffer();
      await localPC.setLocalDescription(offer);
      console.log("\n\n\nOffer\n\n\n", offer);
      await setDoc(roomRef, {offer, connected: false}, {merge: true});
      // listen for remote answer
      onSnapshot(roomRef, (doc) => {
        const data = doc.data();
        if (!localPC.currentRemoteDescription && data.answer) {
            const rtcSessionDescription = new RTCSessionDescription(data.answer);
            localPC.setRemoteDescription(rtcSessionDescription);
        } else {
          setRemoteStream();
        }
      });

      onSnapshot(calleeCandidatesCollection, (snapshot) => {
        snapshot.docChanges().forEach((change) => {
          if (change.type === "added") {
              let data = change.doc.data();
              localPC.addIceCandidate(new RTCIceCandidate(data));
          }
        });
      });
      setCachedLocalPC(localPC);
  };

  // Mutes the local's outgoing audio
  const toggleMute = () => {
      if (!remoteStream) {
        return;
      }
      localStream.getAudioTracks().forEach((track) => {
        track.enabled = !track.enabled;
        setIsMuted(!track.enabled);
      });
  };

    return(
        <SafeAreaView>
            <View style={{flex: 1, backgroundColor: '#dc2626'}}>
              <Text>
                {!remoteStream && (
                  <View style={{flex:1}}>

                  </View>
                )}
                </Text>
                  <Text>
                {remoteStream && (
                  <>
                    <View style={{flex:1}}>

                    </View>
                    (
                      <View style={{
                        width: "8rem", 
                        height: "12rem", 
                        position: "absolute",
                        right: "1.5rem",
                        top: "2rem"
                      }}>

                    </View>
                    )
                  </>
                )}
                </Text>
                <View style={{position: "absolute", bottom: "0", width: '100%'}}>
                  <CallActionBox
                    toggleMute={toggleMute}
                    endCall={endCall}
                  />
                </View>
            </View>
        </SafeAreaView>
    );
}

export default CallScreen;

Platform Information

React Native Version: 0.72.10
WebRTC Module Version: "react-native-webrtc": "^111.0.6"
Platform OS + Version: Android/ Expo

saghul commented 5 months ago

It's not currently possible to provide that.

Given how the RN JS bridge works, it would have abysmal performance even if we could get the raw audio frames to send them to userspace.

Something like Reanimated's worklets might work, but the complexity of that system would make it hard to integrate.

sharjeelBokhari1 commented 5 months ago

Oh okay.

Question 1:

Could you please tell me what would happen if i tried to apply opus decoding on the tracks? Since they give me back something like this when I console.log them:

Track 

 {"_constraints": {}, "_enabled": true, "_muted": false, "_peerConnectionId": 0, "_readyState": "live", "_settings": {}, "id": "aafa8615-c9f9-4aca-8d3e-5f73f6330773", "kind": "audio", "label": "", "remote": true}

What would happen if I applied opus decoding to this particular Track.

Question 2:

Would it be possible to achieve what I am doing if I used React JS instead of React Native.

I really appreciate your help man @saghul.

saghul commented 5 months ago

You can't opus decode a track. A track is just a JS object. Audio data is a few layers down and not accessible.

Using react I would've possible, not because of react itself, but because when running in an actual browser you have the APIs you need to what you want, such as stream transforms.

8BallBomBom commented 5 months ago

Totally looking at the other end of things, into the future. Conversion to the new RN architecture might mean implementing things such as this would be more viable without much of the bumpy performance issues 🤔 would need some looking into though.

sharjeelBokhari1 commented 5 months ago

@saghul Alright but is there a way I can make an audio, being streamed/recorded from a different library such as "react-native-audio", a MediaStream track? I mean I don't want to use the getUserMedia tracks I want to send the audio using webrtc but it isn't done using the getUserMedia, since getUserMedia is the one that accesses the microphone and other media devices if im not wrong. I want to use audio from another api and i want that to be heard to the remote/local peer using react-native-webrtc

Thank you so much for cooperating! Its very helpful!

saghul commented 5 months ago

Unfortunately that is not possible.

You might try sending audio frames via a datachannel, but due to the way the RN bridge works binary data needs to be base64 encoded so I think that overhead will add up.

sharjeelBokhari1 commented 5 months ago

Thank you for your help @saghul

react-native-webrtc / react-native-webrtc