gpac / mp4box.js

JavaScript version of GPAC's MP4Box tool
https://gpac.github.io/mp4box.js/
BSD 3-Clause "New" or "Revised" License
1.92k stars 325 forks source link

[question] Seek and extract single frame #313

Closed jimmyn closed 1 year ago

jimmyn commented 1 year ago

Thanks a lot for the lib, it's super useful!

I have a question. I have an API that allows me to request a range of raw video data.

I'm trying to use seek API to seek through the video, request the smallest possible chunk for each time and extract just one sample (or as few as possible). For example, extract one frame for each second of the video (to generate a preview image).

So far, everything is working fine, but I can't make it work for a single sample. If the chunk is too small, onSamples callback is not triggered, even if nbSamples: 1

Imperially, I figured out that the smallest chunk I need is 1s of data (track.bitrate / 8). Most likely this is just becouse I'm seeking each second, so eventually I'm fetching the entire video file. Is there any way to reduce chunk size?

The thing is that I'm using Web Codecs API to decode frames, but it is not supported in all browsers, so I need to use libav.js to decode frames, and the process is expectedly very slow, so I'm trying to reduce the number of samples to decode as I need only one frame from each seek position of the video.

I tried to skip some frames during decoding, for example only decode frames that correspond to whole seconds, but I get corrupted images.

Is there anything I can do?

jimmyn commented 1 year ago

I also noticed that the same sample is included multiple times in onSamples callback

pkl commented 1 year ago

I have almost the same use case as you; trying to extract a single frame from a MP4 that corresponds to a given timestamp. I'm new to mp4box so progress is slow, to say the least ;)

Can you share something from your current code to help?

jimmyn commented 1 year ago

This is what I have so far. It works, but there are two problems as I mentioned before:

import type { MP4ArrayBuffer, MP4VideoTrack, MP4Info } from 'mp4box';
import MP4Box, { DataStream } from 'mp4box';

const META_PART_SIZE = 64 * 1024;
const MIN_PART_SIZE = 4 * 1024;

enum Status {
  loading = 'loading',
  ready = 'ready',
  closed = 'closed',
}

export type MP4DecoderConfig = {
  codec: string;
  codedHeight: number;
  codedWidth: number;
  description: Uint8Array;
};

type MP4DemuxerConfig = {
  framesPerVideo: number;
  onConfig: (config: any) => void;
  onChunk: (chunk: any) => void;
};

export class MP4Demuxer {
  private readonly url: string;

  private file: MP4Box.MP4File;

  private status = Status.loading;

  private readonly framesPerVideo: number;

  private decodedSamples = new Set<string>();

  private readonly onConfig: (config: MP4DecoderConfig) => void;

  private readonly onChunk: (chunk: any) => void;

  constructor(url: string, { onConfig, onChunk, framesPerVideo }: MP4DemuxerConfig) {
    this.url = url;
    this.framesPerVideo = framesPerVideo;
    this.onConfig = onConfig;
    this.onChunk = onChunk;

    this.file = MP4Box.createFile();
    this.file.onError = (e) => {
      console.error(e);
    };
    this.file.onReady = this.onReady.bind(this);
    this.file.onSamples = this.onSamples.bind(this);

    void this.loadMetadata();
  }

  private async loadMetadata() {
    let offset = 0;
    while (offset !== undefined) {
      offset = await this.requestPart(offset, META_PART_SIZE);
      if (this.status === Status.ready) break;
    }
  }

  private getFrameOffset(time: number) {
    return this.file.seek(time, true).offset;
  }

  private async loadNextFrames(step: number, duration: number, partSize: number) {
    let time = 0;
    let offset = this.getFrameOffset(time);
    while (this.status !== Status.closed && time < duration) {
      await this.requestPart(offset, partSize);
      // Move the time by 0.5s back to make sure it is included in samples
      time += (step - 0.5);
      offset = this.getFrameOffset(time);
    }
    this.file.flush();
  }

  private async requestPart(offset: number, partSize: number) {
    const reminder = (offset % MIN_PART_SIZE);
    const start = offset - reminder;
    const end = start + partSize - 1;
    const response = await fetch(this.url, {
      headers: {
        range: `bytes=${start}-${end}`,
      },
    });
    let arrayBuffer = await response.arrayBuffer() as MP4ArrayBuffer;
    if (reminder) {
      arrayBuffer = arrayBuffer.slice(reminder) as MP4ArrayBuffer;
    }
    arrayBuffer.fileStart = offset;
    return this.file.appendBuffer(arrayBuffer);
  }

  private description(track: MP4VideoTrack) {
    const t = this.file.getTrackById(track.id);
    for (const entry of t.mdia.minf.stbl.stsd.entries) {
      if (entry.avcC || entry.hvcC || entry.av1C) {
        const stream = new DataStream(undefined, 0, DataStream.BIG_ENDIAN);
        if (entry.avcC) {
          entry.avcC.write(stream);
        } else if (entry.hvcC) {
          entry.hvcC.write(stream);
        } else if (entry.av1C) {
          entry.av1C.write(stream);
        }
        return new Uint8Array(stream.buffer, 8); // Remove the box header.
      }
    }
    throw new Error('avcC, hvcC ro av1C not found');
  }

  private onReady(info: MP4Info) {
    console.log(' onReady', info);
    this.status = Status.ready;
    const track = info.videoTracks[0];

    let codec = track.codec;
    if (codec.startsWith('avc1')) {
      // Somehow this is the only avc1 codec that works.
      codec = 'avc1.4d001f';
    }

    // Generate and emit an appropriate VideoDecoderConfig.
    this.onConfig({
      codec,
      codedHeight: track.video.height,
      codedWidth: track.video.width,
      description: this.description(track),
    });

    const duration = info.duration / info.timescale;

    // If we set a part size too small, the onSamples callback is not called.
    const partSize = roundPartSize(track.bitrate / 4);

    const step = Math.max(Math.floor(duration / this.framesPerVideo), 1);

    // Start demuxing.
    this.file.setExtractionOptions(track.id, undefined, { nbSamples: 1 });
    this.file.start();

    // Load frames
    void this.loadNextFrames(step, duration, partSize);
  }

  private onSamples(trackId: number, ref: any, samples: any) {
    // Generate and emit an EncodedVideoChunk for each demuxed sample.
    for (const sample of samples) {
      const time = sample.cts / sample.timescale;
      const type = sample.is_sync ? 'key' : 'delta';
      const id = `${type}${sample.number}`;

      // Skip already decoded samples.
      if (this.decodedSamples.has(id)) continue;

      // @ts-ignore
      this.onChunk(new EncodedVideoChunk({
        type,
        timestamp: (1e6 * time),
        duration: (1e6 * sample.duration) / sample.timescale,
        data: sample.data,
      }));
      this.decodedSamples.add(id);
    }
    const lastSample = samples[samples.length - 1];
    this.file.releaseUsedSamples(trackId, lastSample.number + 1);
  }

  close() {
    this.file.flush();
    this.file.stop();
    this.status = Status.closed;
  }
}

function roundPartSize(size: number) {
  return size + MIN_PART_SIZE - (size % MIN_PART_SIZE);
}

Ideally I should be able to calculate the min part size to get only one frame for the given timestamp and then receive it as a single sample.

jimmyn commented 1 year ago

I did some tests, and it seems that onSamples is triggered only if downloaded array buffers overlap without gaps. My guess that it is related to seek useRap attribute, but I'm not sure how it should be used. Whenever I set it to false I don't get the first key frame and decoding fails.

Failed to execute 'decode' on 'VideoDecoder': A key frame is required after configure() or flush()

If I try to load the first samples without seek just by fetching data sequentially and then do seeking with useRap: false the frames appear corrupted

hughfenghen commented 1 year ago

You need to find the nearest keyframe (sync sample).

function findSamples (samples: MP4Sample[], time: number): number {
  const endIdx = samples.findIndex(s => s.cts >= time)
  const targetSamp = samples[endIdx]

  if (targetSamp == null) throw Error('Not found')
  let startIdx = 0
  if (!targetSamp.is_sync) {
    startIdx = endIdx - 1
    while (true) {
      if (startIdx <= 0) break
      if (samples[startIdx].is_sync) break
      startIdx -= 1
    }
  }
  return samples.slice(startIdx, endIdx + 1)
}

const samples = [/**/]
findSamples(samples, 10 * timescale)
  .forEach((s) => {
    // convert sample to EncodeVideoChunk, then use VideoDecode.decode
  })
8zf commented 1 year ago

I am using mp4box.js and webcodecs to extract frames of some set of timestamps without reading all file content, here are my steps:

jimmyn commented 1 year ago

@8zf thanks! this is exactly what I'm doing right now.

8zf commented 1 year ago

I go through my sample videos and find that keyframe frequency is about 1s/keyframe, so.. if you want frames of each second, I think you will eventually read the whole file. Because as far as I know, key frames are most part of file. https://ottverse.com/i-p-b-frames-idr-keyframes-differences-usecases/