Installation | Simple Example | Detailed Example | Documentation | Contributing | License
TorchCodec is a Python library for decoding videos into PyTorch tensors. It aims to be fast, easy to use, and well integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML models on videos, TorchCodec is how you turn those videos into data.
We achieve these capabilities through:
[!NOTE] ⚠️ TorchCodec is still in early development stage and some APIs may be updated in future versions without a deprecation cycle, depending on user feedback. If you have any suggestions or issues, please let us know by opening an issue!
Here's a condensed summary of what you can do with TorchCodec. For a more detailed example, check out our documentation!
from torchcodec.decoders import VideoDecoder
decoder = VideoDecoder("path/to/video.mp4")
decoder.metadata
# VideoStreamMetadata:
# num_frames: 250
# duration_seconds: 10.0
# bit_rate: 31315.0
# codec: h264
# average_fps: 25.0
# ... (truncated output)
len(decoder) # == decoder.metadata.num_frames!
# 250
decoder.metadata.average_fps # Note: instantaneous fps can be higher or lower
# 25.0
# Simple Indexing API
decoder[0] # uint8 tensor of shape [C, H, W]
decoder[0 : -1 : 20] # uint8 stacked tensor of shape [N, C, H, W]
# Iterate over frames:
for frame in decoder:
pass
# Indexing, with PTS and duration info
decoder.get_frame_at(len(decoder) - 1)
# Frame:
# data (shape): torch.Size([3, 400, 640])
# pts_seconds: 9.960000038146973
# duration_seconds: 0.03999999910593033
decoder.get_frames_at(start=10, stop=30, step=5)
# FrameBatch:
# data (shape): torch.Size([4, 3, 400, 640])
# pts_seconds: tensor([0.4000, 0.6000, 0.8000, 1.0000])
# duration_seconds: tensor([0.0400, 0.0400, 0.0400, 0.0400])
# Time-based indexing with PTS and duration info
decoder.get_frame_displayed_at(pts_seconds=2)
# Frame:
# data (shape): torch.Size([3, 400, 640])
# pts_seconds: 2.0
# duration_seconds: 0.03999999910593033
You can use the following snippet to generate a video with FFmpeg and tryout TorchCodec:
fontfile=/usr/share/fonts/dejavu-sans-mono-fonts/DejaVuSansMono-Bold.ttf
output_video_file=/tmp/output_video.mp4
ffmpeg -f lavfi -i \
color=size=640x400:duration=10:rate=25:color=blue \
-vf "drawtext=fontfile=${fontfile}:fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:text='Frame %{frame_num}'" \
${output_video_file}
Note: if you're on MacOS, you'll need to build from source. The instructions below assume you're on Linux.
Install the latest stable version of PyTorch following the official instructions. TorchCodec requires PyTorch 2.4.
Install FFmpeg, if it's not already installed. Your Linux distribution probably comes with FFmpeg pre-installed. TorchCodec supports all major FFmpeg versions in [4, 7].
If FFmpeg is not already installed, or you need a later version, install it with:
conda install ffmpeg
# or
conda install ffmpeg -c conda-forge
Install TorchCodec:
pip install torchcodec
We are actively working on the following features:
pip install torchcodec
. For now this is only supported
on Linux, but MacOS users can build from source.Let us know if you have any feature requests by opening an issue!
We welcome contributions to TorchCodec! Please see our contributing guide for more details.
TorchCodec is released under the BSD 3 license.