voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models
https://fiftyone.ai
Apache License 2.0
8.09k stars 537 forks source link

[FR] Add native support for videos stored as per-frame image sequences #1962

Open brimoor opened 2 years ago

brimoor commented 2 years ago

Background

Currently FiftyOne requires video samples to provide a filepath to an encoded video file on disk (eg MP4), which is used by the App and other API methods when access to the video data is required.

However, the "native" format of some user's video data is not MP4s but rather directories of image sequences:

/path/to/video/
    000001.jpg
    000002.jpg
    000003.jpg
    ...

When per-frame paths are available, the best practice is to provide these as Frame.filepath values on your FiftyOne dataset like so:

import fiftyone as fo

sample = fo.Sample(filepath="/path/to/video.mp4")
sample.frames[1] = fo.Frame(filepath="/path/to/video/000001.jpg")
sample.frames[2] = fo.Frame(filepath="/path/to/video/000002.jpg")
sample.frames[3] = fo.Frame(filepath="/path/to/video/000003.jpg")

dataset = fo.Dataset()
dataset.add_sample(sample)
print(dataset)

which allows for using FiftyOne's frame views feature to work with the dataset as a set of frame images:

frames = dataset.to_frames()
print(frames)

However, Sample.filepath must also be provided.

Feature request

Should we make Sample.filepath optional for video datasets, if complete per-frame filepath information is provided via Frame.filepath?

Pros

Cons

Workaround

If you only have per-frame image sequences, you can use the builtin transform_video() utility to generate an encoded video from your frames.

The example below demonstrates the idea:

import os
import eta.core.utils as etau

import fiftyone as fo
import fiftyone.utils.video as fouv
import fiftyone.zoo as foz

FRAMES_DIR = "/tmp/quickstart-video"

#
# Generate some example per-frame sequence data
#

video_paths = foz.load_zoo_dataset("quickstart-video").values("filepath")

with fo.ProgressBar() as pb:
    for video_path in pb(video_paths):
        name = os.path.splitext(os.path.basename(video_path))[0]
        frames_patt = os.path.join(FRAMES_DIR, name, "%06d.jpg")
        fouv.transform_video(video_path, frames_patt)

#
# Now generate a video dataset from the per-frame sequences
#

video_dirs = etau.list_subdirs(FRAMES_DIR, abs_paths=True)
samples = []

with fo.ProgressBar() as pb:
    for video_dir in pb(video_dirs):
        # Generate video file
        frames_patt = os.path.join(video_dir, "%06d.jpg")
        frame_numbers = etau.parse_pattern(frames_patt)
        video_path = video_dir + ".mp4"
        fouv.transform_video(
            frames_patt,
            video_path,
            in_opts=["-start_number", str(min(frame_numbers))],
        )

        sample = fo.Sample(filepath=video_path)

        # Optional: populate frame filepaths
        for frame_number in frame_numbers:
            frame = fo.Frame(filepath=frames_patt % frame_number)
            sample.frames[frame_number] = frame

        samples.append(sample)

dataset = fo.Dataset()
dataset.add_samples(samples)

print(dataset.first())
print(dataset.first().frames.first())

session = fo.launch_app(dataset)

# Since we provided `Frame.filepath`, we can switch to frames view!
session.view = dataset.to_frames()
jasm37 commented 2 years ago

Hi @brimoor, thanks for creating this feature request. I tried your workaround but I got an error:

Traceback (most recent call last):
  File "/Users/X/.../test_fo.py", line 34, in <module>
    fouv.transform_video(frames_patt, video_path)
  File "/Users/X/../venv/lib/python3.8/site-packages/fiftyone/utils/video.py", line 404, in transform_video
    _transform_video(
  File "/Users/X/../venv/lib/python3.8/site-packages/fiftyone/utils/video.py", line 769, in _transform_video
    ffmpeg.run(inpath, outpath, verbose=verbose)
  File "/Users/X/../venv/lib/python3.8/site-packages/eta/core/video.py", line 4183, in run
    + ["-i", inpath]
TypeError: can only concatenate list (not "NoneType") to list

I did some digging and found out that in line 4144 of eta/core/video.py the if statement is True then in_opts is set wrongly to None in line 4150 since .extend(..) returns None. I think this is a bug? My setup is Python 3.8.9, FiftyOne v0.16.5 on a MacBook Pro (M1) with macOS Monterey.

brimoor commented 2 years ago

Ah good catch, thanks! That's a bug that was fixed in https://github.com/voxel51/eta/pull/565 and will be resolved in fiftyone>0.16.5.

For now I updated the example code to avoid the issue.

mareksubocz commented 1 year ago

Is there any update on this feature? Would love to use that in my projects, as many labelling tools require frame-by-frame video annotation. Thank you for all the good work :)