pytorch / torchcodec

PyTorch video decoding
BSD 3-Clause "New" or "Revised" License
82 stars 9 forks source link

Add option to return indices from index based samplers. #407

Open Algomancer opened 3 days ago

Algomancer commented 3 days ago

🚀 The feature

When sampling using an indexed (or potentially time based) sampler, add an option to additionally return the sampled indices. For example in random sampling, it would return the random indices that were generated. This would just be a matter of returning the clip_start_indices within the FrameBatch.

https://github.com/pytorch/torchcodec/blob/main/src/torchcodec/samplers/_index_based.py#L153C9-L153C27

Motivation, pitch

For some of my tasks, i need to know the relative position of a particular frame within the larger context of the video, for example for positional embedding and RoPE offsetting.

I am about to implemented this in my local fork and would happily upstream it if it is desired.

NicolasHug commented 3 days ago

Thank you for the feature request @Algomancer . That is definitely in scope and it's someone we want to provide (this is a duplicate of https://github.com/pytorch/torchcodec/issues/246).

This would just be a matter of returning the clip_start_indices within the FrameBatch.

For consistency with the pts_seconds and duration_seconds fields, I think we would be returning the indices of all the frames, not just the clip starts. I think that would still address your use-case as you'd be able to get the clip start index simply with something like clips.indices[:, 0]