ml5js / ml5-library

Friendly machine learning for the web! πŸ€–
https://ml5js.org
Other
6.45k stars 906 forks source link

[devOps] Error: "n.videoElt.captureStream is not a function" in Safari #626

Open ccarse opened 4 years ago

ccarse commented 4 years ago

Dear ml5 community,

I'm submitting a new issue. Please see the details below.

β†’ Step 1: Describe the issue πŸ“

Did you find a bug? Want to suggest an idea for feature? I'm receiving the following bug when trying to use the YOLO model in Safari:

Unhandled Promise Rejection: TypeError: n.videoElt.captureStream is not a function. (In 'n.videoElt.captureStream()', 'n.videoElt.captureStream' is undefined)
dispatchException β€” runtime.js:569

It looks like captureStream isn't supported in Safari? Is there an alternative api I can use?

Here's my code:

import * as React from 'react';
import * as ReactDOM from 'react-dom';
const ml5 = require('ml5');

interface SmartCameraState {
  isLoading: boolean;
  results: string;
  width: number;
  height: number;
}

export class SmartCamera extends React.Component<{}, SmartCameraState> {
  videoRef?: HTMLVideoElement;
  canvasRef?: HTMLCanvasElement;

  detector?: any;

  constructor(props: {}) {
    super(props);
    this.state = {
      isLoading: true,
      results: '',
      width: 640,//1280,
      height: 480//960
    };
  }

  async componentDidMount() {
    if (!this.videoRef || !this.canvasRef) { return; }

    const ctx = this.canvasRef.getContext('2d') as CanvasRenderingContext2D;
    ctx.lineWidth = 5;
    ctx.strokeStyle = "#FFFFFF";
    ctx.font = '20px Arial';
    ctx.textBaseline = 'top';

    // Create a webcam capture
    const stream = await navigator.mediaDevices.getUserMedia({ video: { width: this.state.width, height: this.state.height, facingMode: 'environment'} });
    console.log('Camera loaded');
    this.videoRef.srcObject = stream;
    await this.videoRef.play();

    const classifyVideo = () => {
      this.detector.detect(gotResult);
    }

    const gotResult = (err: any, results: {label: string, confidence: number, x: number, y: number, w: number, h: number}[]) => {
      if (this.state.isLoading) { this.setState({isLoading: false}); }

      ctx.clearRect(0, 0, this.state.width, this.state.height);
      results.forEach(result => {
        const resultStr = `${result.label} ${(result.confidence * 100).toFixed(1)}%`;
        const xpos = this.state.width * result.x;
        const ypos = this.state.height * result.y;
        const boxWidth = this.state.width * result.w;
        const boxHeight = this.state.height * result.h;
        const textWidth = ctx.measureText(resultStr).width;
        // console.log(`x: ${xpos} y: ${ypos} w: ${boxWidth} h: ${boxHeight}`);

        ctx.beginPath();
        ctx.rect(xpos, ypos, boxWidth, boxHeight);
        ctx.stroke();
        ctx.fillStyle = "#FFFFFF";
        ctx.fillRect(xpos, ypos, textWidth, 22);
        ctx.fillStyle = "#000000";
        ctx.fillText(resultStr, xpos, ypos);
      });

      // this.setState({results: JSON.stringify(results, null, 2)});
      classifyVideo();  
    }

    this.videoRef.
    this.detector = await ml5.YOLO(this.videoRef, () => classifyVideo());
    // classifyVideo();
  }

  render() {
    return (
      <>
        {this.state.isLoading ? <div>loading...</div> : null}
        <div>
          <video id="video" autoPlay muted loop playsInline ref={this.setVideoInputRef} width={this.state.width} height={this.state.height} style={{position: 'fixed'}}/>
          <canvas width={this.state.width} height={this.state.height} ref={ ref => ref && (this.canvasRef = ref)} style={{position: 'fixed'}}/>
        </div>
      </>
    );
  }

  setVideoInputRef = (ref: HTMLVideoElement) => {
    ref && (this.videoRef = ref) 
  }
}
ccarse commented 4 years ago

I was able to monkey patch this by doing this.videoRef.captureStream = () => stream; I think it would be nice if instead of passing the model a video or image element we could pass a MediaStream. That way I could just give it the stream that is returned from navigator.mediaDevices.getUserMedia.

joeyklee commented 4 years ago

@ccarse - Thanks so much for this investigation! This is definitely something we need to keep in mind + something to add to the browser testing todos to make sure we can support the modern browsers. Let's keep this issue open as a note.

Thanks for following up!

mr1985 commented 4 years ago

Hello, any update on this topic ? I am having the same issue on Safari when using Yolo with mp4 files

lindapaiste commented 2 years ago

This issue is caused by the Video utility which we use in YOLO, MobileNet, and StyleTransfer.

The Video utility copies the user's video into a new <video> object capturing the video stream -- which is not supported by Safari. https://github.com/ml5js/ml5-library/blob/c3123cac0b1dfa0ed8e3e2588e8dea72ccd05aa8/src/utils/Video.js#L58-L63

I believe that the reason for copying the video is so that we can resize it into the shape that is required by the model, without messing up the video that is displayed on the page.

Funny thing is...it does not actually resize the input data! We are setting the width and height properties on the video, but the TensorFlow tf.browser.fromPixels function gets the size from the videoWidth and videoHeight properties, which contain the intrinsic size of the video. These are read-only properties which we cannot set.

https://github.com/ml5js/ml5-library/blob/c3123cac0b1dfa0ed8e3e2588e8dea72ccd05aa8/src/utils/Video.js#L64-L67

https://github.com/tensorflow/tfjs/blob/8d96f5dd140e7114e167dfe4d4fe4300f4aaf4a8/tfjs-core/src/ops/browser.ts#L131-L136

My recommendation is that we read the video at its current size and convert the current frame into a TensorFlow tensor first. Then we can use TensorFlow functions like tf.image.resizeBilinear to resize the tensor of pixel data.