josephrocca / getVideoFrames.js

Simple JavaScript library to break a video down into individual frames (uses WebCodecs API and MP4Box.js)
MIT License
38 stars 4 forks source link

Pass the mp4boxfile object to extract metadata #7

Closed tobiasBora closed 12 months ago

tobiasBora commented 12 months ago

It is sometimes helpful to have access to the mp4boxfile in order to do more advanced operations (in my case I wanted to read the comment of the video).

josephrocca commented 12 months ago

Sure - can you capitalize the 'f' for consistency? I.e. mp4boxFile.

tobiasBora commented 12 months ago

Sure, done!

josephrocca commented 12 months ago

Thanks!

tobiasBora commented 12 months ago

Btw, completely unrelated, but by any chance do you know if it is somehow possible to use your library to start fetching from an arbitrary frame? The problem is that for videos longer than one minute it quickly fills the allocated memory, so I would need to fetch some arbitrary frames at any time (e.g. the user jumps to minute 3).

josephrocca commented 11 months ago

No, sorry, I just whipped this repo together using some official WebCodec demos - my advice would be to look into those and also into mp4box to see if there is anything that you can use as a starting point.

tobiasBora commented 11 months ago

Ok thanks! I think I can see some solutions (I now understand much better how mp4box.js/video codecs/webcodecs play together), it might not be super easy to implement by now and I still need to check some details… but their is hope!

pedrobroese commented 11 months ago

Btw, completely unrelated, but by any chance do you know if it is somehow possible to use your library to start fetching from an arbitrary frame? The problem is that for videos longer than one minute it quickly fills the allocated memory, so I would need to fetch some arbitrary frames at any time (e.g. the user jumps to minute 3).

Yes it is possible to skip frames using mp4box. The strategy I used was to keep pulling all the frames with mp4box, and drop all BUT the key frames for all the frames from before the interest point:

` if (videoTrack.id === trackId) { //got videoSample
          countSampleVideo += samples.length;
          pendingSampleVideo = samples.length;
          for await (const sample of samples) { //Loop over the samples.         
            //create encoded video chunk
            let type = sample.is_sync ? 'key' : 'delta'; //check if this particular frame is a key frame
            const chunk = new EncodedVideoChunk({
              type,
              timestamp: sample.cts,
              duration: sample.duration,
              data: sample.data,
            });
            if (sample.cts / sample.timescale + videos[currentVideoIndex].start <= Number.parseFloat(lowerSlider.value) / 20) { //check if the frame belongs to before or after interest point
              if (type === 'key') {
                await new Promise((res, rej) => { //this promise control the data flow to decoder. Chunk is only sent after preivous has been dequeued
                  videoDequeuePromise = res;
                  videoDecoder.decode(chunk); // Even if frame is from before interest frame, if it is keyFrame, it is fed to the video decoder
                });
              } else { //if it is from before interest point and not key frame, drop frame and continue
                decodedVideoFrameCount++
                pendingSampleVideo--

                }`

Regarding the memory management, you can flush very big files through memory (I can process multiple 4GB mp4 files). To do that, just make sure that after each mp4 file sample is used, it is released like this:

mp4Box.releaseUsedSamples(videoTrack.id, decodedVideoFrameCount);

And also that, whenever you create a video frame, once it was consumed, it is closed like this:

videoFrame.close(); newVideoFrame.close();

tobiasBora commented 11 months ago

Thanks. It is not clear from the code, but everytime you want to seek a new frame, you need to restart the whole loop? To avoid redoing this whole loop I was thinking to save somewhere the link between samples number to offset in the file (hopefully it would be enough to recover the data). How fast can you reach to any point in the video?

Btw, what is this videoDequeuePromise? I actually like this idea of putting in a queue the list of frames asked to be decoded (since webcodec does not allow custom user values, it is hard to match the decoded frame with the original frame position… but using an independent queue should work).

pedrobroese commented 11 months ago

Thanks. It is not clear from the code, but everytime you want to seek a new frame, you need to restart the whole loop? To avoid redoing this whole loop I was thinking to save somewhere the link between samples number to offset in the file (hopefully it would be enough to recover the data). How fast can you reach to any point in the video?

Btw, what is this videoDequeuePromise? I actually like this idea of putting in a queue the list of frames asked to be decoded (since webcodec does not allow custom user values, it is hard to match the decoded frame with the original frame position… but using an independent queue should work).

Yes, the way I coded everytime the interest point changes you would have to restart the loop. This code is like this because it aims to process all the frames from a mp4 file from a user selected starting point to a user selected ending point. All frames between this 2 points are extracted, painted to a hidden canvas, then I make some drawings over the canvas, save it as a new frame, which then is fed to a muxer to make a new video with animations overlaid. It has to be like this because I work with large files (goPro videos) and the only way to do that without blowing up the memory or writing large amounts of data to disk (which would be slow) is to acquire and process the frames chronoligically, closing samples and frames as they are used.

From your answer, I can't figure exactly what you want to achieve... However if your goal is to reach a predifiend frame, I would try the following: 1) Parse the MOOV box with mp4box (sometimes you have to flush the whole file to reach it). This way, you'll now the exact number of samples for the mp4 file. 2) Once you know the frame number you want to reach, instead of parsing the file from the beginning, use mp4box segmentation to reach it: var mp4box = MP4Box.createFile(); mp4boxfile.onReady = function(info) { ... mp4boxfile.onSegment = function (id, user, buffer, sampleNumber, last) {} mp4boxfile.setSegmentOptions(info.tracks[0].id, sb, options);
var initSegs = mp4boxfile.initializeSegmentation();
mp4boxfile.start(); ... }; 3) Once you have the segment with the frames your aiming at, then use the mp4box.onSamples() to get only the frames from this segment of the file.

The tricky part is that the videoDecoder.decode() allways should get a keyFrame prior to any other following frame to be able to start decoding properly.

Lastly, yes, the idea of using a promise to control the frames flow to decoder is very useful, because mp4Box works really fast, so if the decoder or the rest of your code doesn't work as fast, you'll overwhelm the decoder and it will crash. The correct way to control the flow of frames through the decoder is with videoDecoder.ondequeque event. This event is fired everytime a frame leaves the decoder. So I only feed a new frame once the event has fired. This way, the decoder handles one frame at a time.

tobiasBora commented 11 months ago

Thanks! So I’m interested in a different user experience, as I’m building a frame-precise video player, where the playback automatically stops at some frames. The user should also be able to play the animation backward, or even go to a precise frame at any time (and the interface should not lag): you can find a first working demo here:

https://leo-colisson.github.io/blenderpoint-web/index.html?video=https://leo-colisson.github.io/blenderpoint-web/Demo_24fps.mp4

Unfortunately, for now I put all decoded frames in memory, but for video longer than 1mn it crashes the browser. So my goal now is to cache only a restricted number of decoded frames around the position of the user, and update the set of decoded frames when the user plays the video or jump to another frame.

Thanks, but I’m not sure to see how segmentation is different from extraction here: it does loop everytime through the whole file no and then send frames in batch of 1000 no?

pedrobroese commented 11 months ago

Thanks! So I’m interested in a different user experience, as I’m building a frame-precise video player, where the playback automatically stops at some frames. The user should also be able to play the animation backward, or even go to a precise frame at any time (and the interface should not lag): you can find a first working demo here:

https://leo-colisson.github.io/blenderpoint-web/index.html?video=https://leo-colisson.github.io/blenderpoint-web/Demo_24fps.mp4

Unfortunately, for now I put all decoded frames in memory, but for video longer than 1mn it crashes the browser. So my goal now is to cache only a restricted number of decoded frames around the position of the user, and update the set of decoded frames when the user plays the video or jump to another frame.

Thanks, but I’m not sure to see how segmentation is different from extraction here: it does loop everytime through the whole file no and then send frames in batch of 1000 no?

Yes, it will start from the beginning of the file, sorry for that... I've been through this problem somtime ago, and forgot part of the answer I ment to provideo you: Along with the segmentation, you yould have to use seek (pasting from mp4Box):

seek(time, useRap) Indicates that the next samples to process (for extraction or segmentation) start at the given time (Number, in seconds) or at the time of the previous Random Access Point (if useRap is true, default is false). Returns the offset in the file of the next bytes to be provided via appendBuffer .

mp4boxfile.seek(10, true);

Once you have the buffer offset, you can feed it to your file reading function, so it will start to read at the point of interest. Perhaps you don't even need to segment the file. I don't know which function you are using to read the file as chunks. So, maybe my function below might be helpful, since it already incroporates an offset, which can be set to the result of the seek function:

function readBlockFactory() {
        console.log('fired readblock');
        let chunkSize = 1024 * 1024 * 0.1; // bytes
        let offsetFlag = 0;
        let offset = 0;
        let r;
        //Read chunks progressively in browser
        function stop() {
          console.log('fired stop from readblock');
          reading = false;
          r.abort();
        };
        //Auxiliary function to read file in chunks;
        function read(file, { update, onparsedbuffer }) {
          // console.log('fired read from freadblock');
          let fileSize = file.size;
          r = new FileReader();
          let blob = file.slice(offset, chunkSize + offset);
          r.onload = function (evt) {
            if (evt.target.error == null) {
              //Tell parent function to add data to mp4box
              onparsedbuffer(evt.target.result, offset);
              //Record offset for next chunk
              offset += evt.target.result.byteLength;
              //Provide proress percentage to parent function
              let prog = Math.ceil((50 * offset) / fileSize) + 50 * offsetFlag;
              if (update) update(prog);
            } else {
              throw new Error('Read error: ' + evt.target.error, '');
            }
            //Adapt offset to larger file sizes
            if (offset >= fileSize) {
              //Tell parent function to flush mp4box
              console.log('offset>fileSize');
              offset = 0;
              offsetFlag++;
              r.abort();
            }
            read(file, { update, onparsedbuffer });
          };
          //Use the FileReader
          if (reading) {
            r.readAsArrayBuffer(blob);
          }
        }
        return { read, stop };
      };:
tobiasBora commented 11 months ago

Thanks a lot, I’ll try to wrap my head around this. Sorry to bother you, but last question: how can you do the decoding in firefox? Do you use something like https://github.com/Kagami/ffmpeg.js/ (my understanding of MSE is that it does not do the decoding itself and delegate it to <video>)?

pedrobroese commented 11 months ago

Thanks a lot, I’ll try to wrap my head around this. Sorry to bother you, but last question: how can you do the decoding in firefox? Do you use something like https://github.com/Kagami/ffmpeg.js/ (my understanding of MSE is that it does not do the decoding itself and delegate it to <video>)?

No problems, happy to help. By the way do you have a repo for this particular project? If so, I would be happy to contribute. Now answering your question, FF does support mp4box and webCodecs, so the code I wrote work there as well. Acctually, it works even on Android mobiles. By experience, I can tell you that to use the html video element interface is not the right path for you to go, because it will fail exactly at the point your aiming at: decoding with frame level precision. I first tried to use it, and ended up going to the webcodecs API exactly because the video element wouldn't ever give me consistent frame output (skip frames and variable frame rate).

tobiasBora commented 11 months ago

Sure, the github is github.com/leo-colisson/blenderpoint-web! Yeah, I also tried to use <video>, and it was such a mess with seeking the wrong frame etc…

Whhhaat?? What FF do you have to support webcodecs?? I’m really surprised, as there are a number of sources that say the opposite, including my own try to run my code on FF:

image

But if you can fix my code to make it work on firefox, I’d love to see it!

pedrobroese commented 11 months ago

Sure, the github is github.com/leo-colisson/blenderpoint-web! Yeah, I also tried to use <video>, and it was such a mess with seeking the wrong frame etc…

Whhhaat?? What FF do you have to support webcodecs?? I’m really surprised, as there are a number of sources that say the opposite, including my own try to run my code on FF:

image

  • if I type VideoDecoder in chromium, it works, if I type the same in firefox, it fails saying that it is not defined. By any chance is your app available online?

But if you can fix my code to make it work on firefox, I’d love to see it!

Hey

github.com/leo-colisson/blenderpoint-web

Hey, sorry for that, once more I was picking up info from the top of my head and ended up mixing things. My app has 2 parts: The first one allows the user to visualize his video with the overlaid animations. For this part I use a video element, and this part works in FF. The videoDecoder, which I only call when the user wants to create a new video from the original, doesn't work on FF, as it relys on the videoDecoder which, as you correctly pointed, is not supported.

And yes, my app is available and you can use it for free at www.easy-telemetry.com. Only thing is that, to get it to see how the video overlay works, you'll have to have a goPro video with telemetry data, since the goal of the app is to overlay gauges with this data over the original video. In case you don't have any, I have one for testing in my drive:

https://drive.google.com/drive/folders/1oHa3rk88haCW2f91X7LDMPQuOH8l6WnV?usp=sharing