marek-simonik / record3d-wifi-streaming-and-rgbd-mp4-3d-video-demo

MIT License
28 stars 7 forks source link

Streaming full 6-DOF Pose over wifi #8

Open bill-healey opened 1 month ago

bill-healey commented 1 month ago

Is it possible to add wifi streaming support for the full camera pose? The full pose would very useful for localization and for input to things like NERF/gaussian splats.

I believe ARKit already provides the full pose.

marek-simonik commented 1 month ago

I am not sure what would be the correct way of adding camera pose data to the stream. The primary issue is synchronization — if I were to send the pose data over a WebRTC data channel, then I do not know how I would implement correct pairing of a camera pose to a specific video frame.

I know you asked specifically about Wi-Fi streaming, but the USB streaming feature does already provide camera poses, so it might be worth considering USB streaming as an alternative to Wi-Fi streaming if that is possible in your workflow.

bill-healey commented 4 weeks ago

I did see that it USB streaming supports it but was trying to get this working for a wireless robot and yours is one of the few apps that streams 3d video over wifi. Sending the pose would also make it much easier to build things like NERFs and Gaussian Splats in realtime without a super long usb cord or trying to carry a laptop and a phone. The NeRF Capture app for example is somewhat popular for those use-cases because it does send the pose and RGBD frame wirelessly but does not support webRTC or streaming.

As far as synchronization, two potential ideas: 1) It looks like ARFrame already includes a timestamp, it's probably enough to just send that same timestamp in both the video stream and dataChannel, then the client can match/synchronize if desired.

2) It looks like WebRTC also supports sending metadata along with the video stream. This doesn't seem to be supported by all clients but might be an easy way to include it in a synchronized way.

marek-simonik commented 3 weeks ago

I have no problem with sending the camera pose in a WebRTC data stream together with ARKit's timestamp and then encoding each RTC video frame with the same timestamp. In fact, I have already tried it, but I couldn't observe the same RTC timestamp in a web browser which receives the WebRTC stream — the timestamp, which I set for a frame during video encoding in Record3D, did not appear when decoding the frames in a web browser.

I used the <videoTag>.requestVideoFrameCallback() callback to observe timestamps of received video frames, but none of the timestamps looked the same as the timestamp I used to encode the frame with in Record3D.

But perhaps that is not the correct way of observing RTCVideoFrame timestamps — please let me know what else should I try in order to read the WebRTC frame's timestamp.

Do you have a link which describes your second proposed idea in more detail (so that I can try to implement it)?

bill-healey commented 3 weeks ago

Yeah it looks like both the timestamps and frameID on the WebRTC frames might actually be generated client-side so they don't really help.

Here are the docs on how to add metadata to the stream, looks like the best way to do it is to basically hook the decoder/encoder so you can add whatever you want. https://github.com/w3c/webrtc-encoded-transform/blob/main/explainer.md

Perhaps something like this would work (with pose instead of timestamp):

async function createSender() {
  const stream = await navigator.mediaDevices.getUserMedia({video: true});
  const [track] = stream.getVideoTracks();
  const sender = peerConnection.addTrack(track, stream);

  const senderTransform = new TransformStream({
    transform: (encodedFrame, controller) => {
      // Add metadata to frame
      encodedFrame.additionalData = new Uint8Array([
        ...new Uint8Array(encodedFrame.additionalData || []),
        ...new TextEncoder().encode(JSON.stringify({serverTimestamp: Date.now()}))
      ]);
      controller.enqueue(encodedFrame);
    }
  });

  sender.createEncodedStreams().readable
    .pipeThrough(senderTransform)
    .pipeTo(sender.createEncodedStreams().writable);
}

function setupReceiver(receiver) {
  const receiverTransform = new TransformStream({
    transform: (encodedFrame, controller) => {
      if (encodedFrame.additionalData) {
        const metadata = JSON.parse(new TextDecoder().decode(encodedFrame.additionalData));
        console.log('Server timestamp:', metadata.serverTimestamp);
      }
      controller.enqueue(encodedFrame);
    }
  });

  receiver.createEncodedStreams().readable
    .pipeThrough(receiverTransform)
    .pipeTo(receiver.createEncodedStreams().writable);
}
marek-simonik commented 1 week ago

I apologize for the very late reply. After building the latest Google webrtc library (which is what Record3D uses for WebRTC), I couldn't find an API related to encoded transforms.

To be specific, the iOS WebRTC.framework does not seem to contain RTCRtpSender.transform.

If I am just missing something, and encoded transforms are supported by Google's webrtc library, then please let me know and I will include per-frame metadata similar to how it's done in the JS snippet you posted.