pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.26k stars 6.96k forks source link

Allow ffmpeg-python backend for torchvision.io.write_video? #8569

Closed adaGrad1 closed 1 month ago

adaGrad1 commented 3 months ago

🚀 The feature

Create another backend for torchvision.io.write_video which uses ffmpeg-python as a backend, but which otherwise has exactly the same interface/functionality.

Motivation, pitch

torchvision.io.write_video currently calls PyAV, which in turn is a wrapper for ffmpeg. PyAV has an issue which seems still unresolved where setting the CRF (constant rate factor) through the options has no effect. This issue has been referenced as recently as March of this year. As far as I can tell, adjusting CRF is the canonical way to tune a video's level of compression. Adding support for ffmpeg-python as a backend would let users tune CRF, which would allow arbitrary levels of compression.

Alternatives

If there is some other set of options which can be passed to write_video to alter the level of compression, that would be an acceptable alternative (at least for my use-case). In this case, it would be ideal to include this alternative set of options in the write_video documentation as an example.

Additional context

I already kind of got it working in a notebook, but it's missing support for audio and such.

# Define output video parameters
output_filename = 'output_video.mp4'
fps = 30
codec = 'libx264' 

# Create the input process from the NumPy array
process1 = (
    ffmpeg
    .input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(video_array.shape[2], video_array.shape[1]))
    .output(output_filename, pix_fmt='yuv420p', r=fps, vcodec=codec, crf=10)
    .overwrite_output()
    .run_async(pipe_stdin=True)
)

# Write the NumPy array to the input pipe
for frame in video_array:
    process1.stdin.write(frame.tobytes())

# Close the input pipe
process1.stdin.close()

# Wait for the ffmpeg process to finish
process1.wait()

crf=10 produces something good-looking, while crf=50 produces something very compressed-looking as expected.

NicolasHug commented 1 month ago

hi @adaGrad1 , and thank you for the feature request. We'll be making a wider announcement soon, but we plan to migrate video decoding/encoding efforts away from torchvision/torchaudio, and consolidate all that within https://github.com/pytorch/torchcodec/. At this time video-encoding isn't implemented in torchcodec, but that can be in scope. It does mean however that we won't be able to include additional video encoding capabilities to torchvision, so I'm afraid we won't be adding the ffmpeg-python backend in vision. We'll definitely keep that crf issue in mind while working on the torchcodec encoder though