gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
33.41k stars 2.53k forks source link

Video Output fails #1508

Closed Cinthia-Kleiner closed 2 years ago

Cinthia-Kleiner commented 2 years ago

Describe the bug

Hi guys ! I am using Gradio to deploy some object detection custom applications (YOLOV5). After process the video content I made a video in OpenCv with the bouding boxes of the object detections. But the returned video shows as the image below and I play it. But, if I play the

image

Here is how I am writiting the video in OpenCv: image

Here is where I call gradio components: image

Moreover, I tried to use the gradio video example in the components documentation and I had the same issue. I noticed that this problem always happens with videos generated by opencv framework. It is a codec problem ? Could you tell me if I missed anything ?

Thanks a lot

Is there an existing issue for this?

Reproduction

Problem with all opencv generated videos

Screenshot

No response

Logs

No logs

System Info

Gradio Version == 3.0.9
OS: Linux 20.04.1-Ubuntu

Severity

blocking all usage of gradio

aliabid94 commented 2 years ago

Will take a look!

aliabid94 commented 2 years ago

I believe the error is because you encode the file as mp4v but attach an extension of mp4. Browsers can only play video/mp4, video/webm, and video/ogg codec videos., not video/m4v. Can you set the VideoWriter_fourcc lines to output as mp4 format instead?

You can also set the output file format to be ".m4v", which will make the browser show a downloadable link instead of an inbrowser playable video.

Let me know if this solves the problem.

aliabid94 commented 2 years ago

I figured it out :) you need to use 'h264' instead of 'mp4v'. Should work perfectly then!

Here's a sample code:

import cv2
import gradio as gr
import tempfile

def combine(img_files):
    img_array = []
    import os
    for filename in img_files:
        img = cv2.imread(filename.name)
        height, width, _ = img.shape
        size = (width,height)
        img_array.append(img)
    # output_file = tempfile.NamedTemporaryFile(suffix=".mp4")
    output_file = "test.mp4"
    out = cv2.VideoWriter(output_file,cv2.VideoWriter_fourcc(*'h264'), 15, size) 
    for i in range(len(img_array)):
        out.write(img_array[i])
    out.release()
    return output_file

demo = gr.Interface(combine, inputs=gr.File(file_count="multiple"), outputs=gr.Video())

if __name__ == "__main__":
    demo.launch()
aliabid94 commented 2 years ago

actually seems like h264 falls back to avc1, which is what the browser actually supports. So you can put that instead!

Cinthia-Kleiner commented 2 years ago

Hey aliabid94 I will check it

Cinthia-Kleiner commented 2 years ago

Hey aliabid94 Thanks a lot for the help But I am having some problems with h264 and open cv license. Do you know with there is any other coded that I can check ?

abidlabs commented 2 years ago

Hi @Cinthia-Kleiner just wanted to ask, did you try avc1?

Cinthia-Kleiner commented 2 years ago

Yes, I have the same problem with avc1 - open cv does not create the file I found some topics about it: https://github.com/opencv/opencv-python/issues/299

Also, I foud a codec that worked (VP90 - avaiable in opencv), but it is really slow =X

aliabid94 commented 2 years ago

There are 3 browser playable formats:

abidlabs commented 2 years ago

Closing as using the above codecs should work!

kumar045 commented 1 year ago

I checked same video is loading in html file but not in gradio so this is not a codec issue but gradio is not working in Ubuntu

coolcb commented 1 year ago

I figured it out :) you need to use 'h264' instead of 'mp4v'. Should work perfectly then!

Here's a sample code:

import cv2
import gradio as gr
import tempfile

def combine(img_files):
    img_array = []
    import os
    for filename in img_files:
        img = cv2.imread(filename.name)
        height, width, _ = img.shape
        size = (width,height)
        img_array.append(img)
    # output_file = tempfile.NamedTemporaryFile(suffix=".mp4")
    output_file = "test.mp4"
    out = cv2.VideoWriter(output_file,cv2.VideoWriter_fourcc(*'h264'), 15, size) 
    for i in range(len(img_array)):
        out.write(img_array[i])
    out.release()
    return output_file

demo = gr.Interface(combine, inputs=gr.File(file_count="multiple"), outputs=gr.Video())

if __name__ == "__main__":
    demo.launch()

thanks it work for me

loretoparisi commented 10 months ago

@abidlabs I'm having the same issue with MP4V in the Stable Video Diffusion sampler with Gradio

def sample(
    input_path: str = "assets/test_image.png",  # Can either be image file or folder with image files
    resize_image: bool = False,
    num_frames: Optional[int] = None,
    num_steps: Optional[int] = None,
    fps_id: int = 6,
    motion_bucket_id: int = 127,
    cond_aug: float = 0.02,
    seed: int = 23,
    decoding_t: int = 14,  # Number of frames decoded at a time! This eats most VRAM. Reduce if necessary.
    device: str = "cuda",
    output_folder: Optional[str] = os.path.join(ROOT,"outputs"),
    skip_filter: bool = False,
):
    """
    Simple script to generate a single sample conditioned on an image `input_path` or multiple images, one for each
    image file in folder `input_path`. If you run out of VRAM, try decreasing `decoding_t`.
    """
    torch.manual_seed(seed)

    path = Path(input_path)
    all_img_paths = []
    if path.is_file():
        if any([input_path.endswith(x) for x in ["jpg", "jpeg", "png"]]):
            all_img_paths = [input_path]
        else:
            raise ValueError("Path is not valid image file.")
    elif path.is_dir():
        all_img_paths = sorted(
            [
                f
                for f in path.iterdir()
                if f.is_file() and f.suffix.lower() in [".jpg", ".jpeg", ".png"]
            ]
        )
        if len(all_img_paths) == 0:
            raise ValueError("Folder does not contain any images.")
    else:
        raise ValueError
    all_out_paths = []
    for input_img_path in all_img_paths:
        with Image.open(input_img_path) as image:
            if image.mode == "RGBA":
                image = image.convert("RGB")
            if resize_image and image.size != (1024, 576):
                print(f"Resizing {image.size} to (1024, 576)")
                image = TF.resize(TF.resize(image, 1024), (576, 1024))
            w, h = image.size

            if h % 64 != 0 or w % 64 != 0:
                width, height = map(lambda x: x - x % 64, (w, h))
                image = image.resize((width, height))
                print(
                    f"WARNING: Your image is of size {h}x{w} which is not divisible by 64. We are resizing to {height}x{width}!"
                )

            image = ToTensor()(image)
            image = image * 2.0 - 1.0

        image = image.unsqueeze(0).to(device)
        H, W = image.shape[2:]
        assert image.shape[1] == 3
        F = 8
        C = 4
        shape = (num_frames, C, H // F, W // F)
        if (H, W) != (576, 1024):
            print(
                "WARNING: The conditioning frame you provided is not 576x1024. This leads to suboptimal performance as model was only trained on 576x1024. Consider increasing `cond_aug`."
            )
        if motion_bucket_id > 255:
            print(
                "WARNING: High motion bucket! This may lead to suboptimal performance."
            )

        if fps_id < 5:
            print("WARNING: Small fps value! This may lead to suboptimal performance.")

        if fps_id > 30:
            print("WARNING: Large fps value! This may lead to suboptimal performance.")

        value_dict = {}
        value_dict["motion_bucket_id"] = motion_bucket_id
        value_dict["fps_id"] = fps_id
        value_dict["cond_aug"] = cond_aug
        value_dict["cond_frames_without_noise"] = image
        value_dict["cond_frames"] = image + cond_aug * torch.randn_like(image)
        value_dict["cond_aug"] = cond_aug
        # low vram mode
        model.conditioner.cpu()
        model.first_stage_model.cpu()
        torch.cuda.empty_cache()
        model.sampler.verbose = True

        with torch.no_grad():
            with torch.autocast(device):
                model.conditioner.to(device)
                batch, batch_uc = get_batch(
                    get_unique_embedder_keys_from_conditioner(model.conditioner),
                    value_dict,
                    [1, num_frames],
                    T=num_frames,
                    device=device,
                )
                c, uc = model.conditioner.get_unconditional_conditioning(
                    batch,
                    batch_uc=batch_uc,
                    force_uc_zero_embeddings=[
                        "cond_frames",
                        "cond_frames_without_noise",
                    ],
                )
                model.conditioner.cpu()
                torch.cuda.empty_cache()

                # from here, dtype is fp16
                for k in ["crossattn", "concat"]:
                    uc[k] = repeat(uc[k], "b ... -> b t ...", t=num_frames)
                    uc[k] = rearrange(uc[k], "b t ... -> (b t) ...", t=num_frames)
                    c[k] = repeat(c[k], "b ... -> b t ...", t=num_frames)
                    c[k] = rearrange(c[k], "b t ... -> (b t) ...", t=num_frames)
                for k in uc.keys():
                    uc[k] = uc[k].to(dtype=torch.float16)
                    c[k] = c[k].to(dtype=torch.float16)

                randn = torch.randn(shape, device=device, dtype=torch.float16)

                additional_model_inputs = {}
                additional_model_inputs["image_only_indicator"] = torch.zeros(
                    2, num_frames
                ).to(device, )
                additional_model_inputs["num_video_frames"] = batch["num_video_frames"]

                for k in additional_model_inputs:
                    if isinstance(additional_model_inputs[k], torch.Tensor):
                        additional_model_inputs[k] = additional_model_inputs[k].to(dtype=torch.float16)

                def denoiser(input, sigma, c):
                    return model.denoiser(
                        model.model, input, sigma, c, **additional_model_inputs
                    )

                samples_z = model.sampler(denoiser, randn, cond=c, uc=uc)
                samples_z.to(dtype=model.first_stage_model.dtype)
                ##

                model.en_and_decode_n_samples_a_time = decoding_t
                model.first_stage_model.to(device)
                samples_x = model.decode_first_stage(samples_z)
                samples = torch.clamp((samples_x + 1.0) / 2.0, min=0.0, max=1.0)
                model.first_stage_model.cpu()
                torch.cuda.empty_cache()

                os.makedirs(output_folder, exist_ok=True)
                base_count = len(glob(os.path.join(output_folder, "*.mp4")))
                video_path = os.path.join(output_folder, f"{base_count:06d}.mp4")
                writer = cv2.VideoWriter(
                    video_path,
                    cv2.VideoWriter_fourcc(*"MP4V"),
                    fps_id + 1,
                    (samples.shape[-1], samples.shape[-2]),
                )

                samples = embed_watermark(samples)
                if not skip_filter:
                    samples = filter(samples)
                else:
                    print("WARNING: You have disabled the NSFW/Watermark filter. Please do not expose unfiltered results in services or applications open to the public.")
                vid = (
                    (rearrange(samples, "t c h w -> t h w c") * 255)
                    .cpu()
                    .numpy()
                    .astype(np.uint8)
                )
                for frame in vid:
                    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
                    writer.write(frame)
                writer.release()
                all_out_paths.append(video_path)
    return all_out_paths

If I try

writer = cv2.VideoWriter(
                    video_path,
                    cv2.VideoWriter_fourcc(*"h264"),
                    fps_id + 1,
                    (samples.shape[-1], samples.shape[-2]),
                )

I get some errors and no playback. If I use *MP4V I get some warnings but the video is generated:

Sampling with EulerEDMSampler for 31 steps:   0%|          | 0/31 [00:00<?, ?it/s]/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Sampling with EulerEDMSampler for 31 steps:  97%|█████████▋| 30/31 [02:24<00:04,  4.82s/it]
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/gradio/components/video.py:274: UserWarning: Video does not have browser-compatible container or codec. Converting to mp4
  warnings.warn(

but no playback though.