Closed evatt-harvey-salinger closed 2 months ago
Hi @evatt-harvey-salinger. Under the hood, fo.core.video.make_frames_dataset
is the function used to create a Dataset.to_frames()
view. The term dataset
is likely overloaded in this context, but the other to_*
stages use the same nomenclature, e.g. fo.core.patches.make_patches_dataset
and Dataset.to_patches()
Zooming out a bit, perhaps adding support (or a best practice) for annotating frame collections is the main goal?
@evatt-harvey-salinger thanks for calling this out!
I think it is a valid use case to directly call methods like make_frames_dataset()
and that, indeed, you should get a "regular" dataset when you do that. This will be supported as of https://github.com/voxel51/fiftyone/pull/4416.
In the meantime, it is slightly less efficient, but you can achieve the same end result via clone()
like this:
patches_dataset = sample_collection.to_patches(...).clone()
frames_dataset = sample_collection.to_frames(...).clone()
clips_dataset = sample_collection.to_clips(...).clone()
Thanks @brimoor and @benjaminpkane!
Great, looks like #4416 will address the suggestion that make_frames_dataset()
should return a "regular" dataset.
In general, I agree that adding support for annotating a FrameView
of a video
dataset would be an amazing feature. I can envision a few good use cases...
Currently, its seems that the workflow would be to use to_frames(...).clone
to sample and annotate a subset of the video, and then maintain the video
dataset alongside the "to_frames.clone" dataset. I could either (1) store the annotations in "to_frames.clone" dataset, and progressively sample more frames of the video, merging and labeling them into the "to_frames.clone" dataset in batches, or (2) store the annotations in the video dataset, by annotating the "to_frames.clone" dataset and then merging the annotations into the video
frames by associating the frame_number
's.
This is certainly doable. But if FrameView
s could be annotated directly, and the annotations could be imported straight into the video
dataset, it would prevent the need to flow back and forth between two datasets (and mitigate the risks of accidentally tweaking one dataset out of alignment to the other).
I'll close the issue, since #4416 addresses the original request. But I'd love to hear what you think more general idea of annotating FrameView
s directly, so I'll stay tuned on the thread!
Out of curiosity, is there a reason you specifically want to annotate your videos as individual frames rather than directly calling annotate()
on your media_type == "video"
dataset?
Hi Brian,
I've tried to answer this a few different times, but then I get new ideas and try to hack together a solution. But I haven't really found one yet.
Basically, I have many hours worth of 15 fps videos to label. Each video sample has wayyy to many frames to label all at once. I'd like to be able downsample and iteratively label portions of video datasets, while retaining the integrity of the video samples as videos (rather than just converting them into image datasets). That would enable me to annotate the videos at 1 fps, then come back and annotate at 4 fps. Or, use the 1 fps frames to training a model that can help me auto-label a portion of the unlabeled frames.
For example, I have a workflow with images datasets that looks like this:
1) request annotation for a view first_pass
2) retrieve annotations, and use the anno_results.frame_id_map
to select the frame_ids to reconstruct the first_pass
view (a capability we should add btw :) )
3) programtically exchange label_requested
tags for labeled
tags
4) train a model on the labeled
samples
5) run inference on the unlabeled
samples and label them as auto-labeled
6) form a new view second_pass
with a portion of the auto-labeled samples, where I correct the auto-labeled predictions
7) retrieve those annotations, and iterate
I would like to develop an analogous workflow for video datasets. Sending FrameView
s for annotation, then retrieving the annotations and pulling directly into my video dataset would be the cleanest way enable this kind of workflow.
As I said, I've been trying to find a work around, but haven't been able to achieve a solution yet that isn't terribly convoluted. I know that I can just abandon the video
datasets altogether and just convert everything to image datasets, but it would be a shame to not make use of the other video
datasets capabilities. I would also like to keep the source files as videos, which are cleaner to store, version, view, etc.
I've gotten close to a solution where I maintain a video
dataset and a corresponding image
dataset as a pair. I can use the workflow above to add annotations to the images, then use the frame_numbers
(a field automatically populated by make_frames_dataset()
) to merge the annotations back into the video dataset. However, this has proven to be quite tricky.
I know that I can use the frame_step
parameters in annotate
with the CVAT backend. But if I use the tracks
feature in CVAT, then the detections actually get interpolated once they are imported into the FO dataset anyways. For example, if I use a frame_step=8
for a 32 frame video, I would only label ~4 frames in CVAT. But after importing back into FO, all 32 frames are labelled.
frame_step
can't be used for datasets that already have tracks anyways.
Because of these two things, I'm going to just live with label full fps videos (with whatever downsampling i want on the front end), and achieve "partial" annotation by just sending different clips within the video at a time.
Anyways, I hope this description give you an idea of the workflow I was trying to achieve by annotating FrameView
s directly!
@evatt-harvey-salinger I added support for passing frame views directly to annotate()
in https://github.com/voxel51/fiftyone/pull/4477! 🎉
Wonderful. Thanks Brian!
Describe the problem
fo.core.video.make_frames_dataset seems like it should create a basic
Dataset
, as opposed toDataset.to_frames
which should make aFramesView
. However, this line... (https://github.com/voxel51/fiftyone/blob/86125ab7851a1656fa46e0719edde8dd94f3c3eb/fiftyone/core/video.py#L635) ...results inDataset._is_frames
beingTrue
.So even though the dataset is actually a type
Dataset
, functions that take itsview()
likeannotate
consider it aFramesView
.Is this expected behavior? I would assume that
make_frames_dataset
was specifically intended to create something that was distinct from a frames view, and would be treated as a normal dataset. (I'll note, evendataset.clone
returns a dataset where_is_frames=False
.)Code to reproduce issue
System information
python --version
): 3.12.2fiftyone --version
): 0.23.8Other info/logs
Willingness to contribute