Closed Cinthia-Kleiner closed 2 years ago
Will take a look!
I believe the error is because you encode the file as mp4v but attach an extension of mp4. Browsers can only play video/mp4, video/webm, and video/ogg codec videos., not video/m4v. Can you set the VideoWriter_fourcc lines to output as mp4 format instead?
You can also set the output file format to be ".m4v", which will make the browser show a downloadable link instead of an inbrowser playable video.
Let me know if this solves the problem.
I figured it out :) you need to use 'h264' instead of 'mp4v'. Should work perfectly then!
Here's a sample code:
import cv2
import gradio as gr
import tempfile
def combine(img_files):
img_array = []
import os
for filename in img_files:
img = cv2.imread(filename.name)
height, width, _ = img.shape
size = (width,height)
img_array.append(img)
# output_file = tempfile.NamedTemporaryFile(suffix=".mp4")
output_file = "test.mp4"
out = cv2.VideoWriter(output_file,cv2.VideoWriter_fourcc(*'h264'), 15, size)
for i in range(len(img_array)):
out.write(img_array[i])
out.release()
return output_file
demo = gr.Interface(combine, inputs=gr.File(file_count="multiple"), outputs=gr.Video())
if __name__ == "__main__":
demo.launch()
actually seems like h264 falls back to avc1, which is what the browser actually supports. So you can put that instead!
Hey aliabid94 I will check it
Hey aliabid94 Thanks a lot for the help But I am having some problems with h264 and open cv license. Do you know with there is any other coded that I can check ?
Hi @Cinthia-Kleiner just wanted to ask, did you try avc1
?
Yes, I have the same problem with avc1 - open cv does not create the file I found some topics about it: https://github.com/opencv/opencv-python/issues/299
Also, I foud a codec that worked (VP90 - avaiable in opencv), but it is really slow =X
There are 3 browser playable formats:
Closing as using the above codecs should work!
I checked same video is loading in html file but not in gradio so this is not a codec issue but gradio is not working in Ubuntu
I figured it out :) you need to use 'h264' instead of 'mp4v'. Should work perfectly then!
Here's a sample code:
import cv2 import gradio as gr import tempfile def combine(img_files): img_array = [] import os for filename in img_files: img = cv2.imread(filename.name) height, width, _ = img.shape size = (width,height) img_array.append(img) # output_file = tempfile.NamedTemporaryFile(suffix=".mp4") output_file = "test.mp4" out = cv2.VideoWriter(output_file,cv2.VideoWriter_fourcc(*'h264'), 15, size) for i in range(len(img_array)): out.write(img_array[i]) out.release() return output_file demo = gr.Interface(combine, inputs=gr.File(file_count="multiple"), outputs=gr.Video()) if __name__ == "__main__": demo.launch()
thanks it work for me
@abidlabs I'm having the same issue with MP4V
in the Stable Video Diffusion sampler with Gradio
def sample(
input_path: str = "assets/test_image.png", # Can either be image file or folder with image files
resize_image: bool = False,
num_frames: Optional[int] = None,
num_steps: Optional[int] = None,
fps_id: int = 6,
motion_bucket_id: int = 127,
cond_aug: float = 0.02,
seed: int = 23,
decoding_t: int = 14, # Number of frames decoded at a time! This eats most VRAM. Reduce if necessary.
device: str = "cuda",
output_folder: Optional[str] = os.path.join(ROOT,"outputs"),
skip_filter: bool = False,
):
"""
Simple script to generate a single sample conditioned on an image `input_path` or multiple images, one for each
image file in folder `input_path`. If you run out of VRAM, try decreasing `decoding_t`.
"""
torch.manual_seed(seed)
path = Path(input_path)
all_img_paths = []
if path.is_file():
if any([input_path.endswith(x) for x in ["jpg", "jpeg", "png"]]):
all_img_paths = [input_path]
else:
raise ValueError("Path is not valid image file.")
elif path.is_dir():
all_img_paths = sorted(
[
f
for f in path.iterdir()
if f.is_file() and f.suffix.lower() in [".jpg", ".jpeg", ".png"]
]
)
if len(all_img_paths) == 0:
raise ValueError("Folder does not contain any images.")
else:
raise ValueError
all_out_paths = []
for input_img_path in all_img_paths:
with Image.open(input_img_path) as image:
if image.mode == "RGBA":
image = image.convert("RGB")
if resize_image and image.size != (1024, 576):
print(f"Resizing {image.size} to (1024, 576)")
image = TF.resize(TF.resize(image, 1024), (576, 1024))
w, h = image.size
if h % 64 != 0 or w % 64 != 0:
width, height = map(lambda x: x - x % 64, (w, h))
image = image.resize((width, height))
print(
f"WARNING: Your image is of size {h}x{w} which is not divisible by 64. We are resizing to {height}x{width}!"
)
image = ToTensor()(image)
image = image * 2.0 - 1.0
image = image.unsqueeze(0).to(device)
H, W = image.shape[2:]
assert image.shape[1] == 3
F = 8
C = 4
shape = (num_frames, C, H // F, W // F)
if (H, W) != (576, 1024):
print(
"WARNING: The conditioning frame you provided is not 576x1024. This leads to suboptimal performance as model was only trained on 576x1024. Consider increasing `cond_aug`."
)
if motion_bucket_id > 255:
print(
"WARNING: High motion bucket! This may lead to suboptimal performance."
)
if fps_id < 5:
print("WARNING: Small fps value! This may lead to suboptimal performance.")
if fps_id > 30:
print("WARNING: Large fps value! This may lead to suboptimal performance.")
value_dict = {}
value_dict["motion_bucket_id"] = motion_bucket_id
value_dict["fps_id"] = fps_id
value_dict["cond_aug"] = cond_aug
value_dict["cond_frames_without_noise"] = image
value_dict["cond_frames"] = image + cond_aug * torch.randn_like(image)
value_dict["cond_aug"] = cond_aug
# low vram mode
model.conditioner.cpu()
model.first_stage_model.cpu()
torch.cuda.empty_cache()
model.sampler.verbose = True
with torch.no_grad():
with torch.autocast(device):
model.conditioner.to(device)
batch, batch_uc = get_batch(
get_unique_embedder_keys_from_conditioner(model.conditioner),
value_dict,
[1, num_frames],
T=num_frames,
device=device,
)
c, uc = model.conditioner.get_unconditional_conditioning(
batch,
batch_uc=batch_uc,
force_uc_zero_embeddings=[
"cond_frames",
"cond_frames_without_noise",
],
)
model.conditioner.cpu()
torch.cuda.empty_cache()
# from here, dtype is fp16
for k in ["crossattn", "concat"]:
uc[k] = repeat(uc[k], "b ... -> b t ...", t=num_frames)
uc[k] = rearrange(uc[k], "b t ... -> (b t) ...", t=num_frames)
c[k] = repeat(c[k], "b ... -> b t ...", t=num_frames)
c[k] = rearrange(c[k], "b t ... -> (b t) ...", t=num_frames)
for k in uc.keys():
uc[k] = uc[k].to(dtype=torch.float16)
c[k] = c[k].to(dtype=torch.float16)
randn = torch.randn(shape, device=device, dtype=torch.float16)
additional_model_inputs = {}
additional_model_inputs["image_only_indicator"] = torch.zeros(
2, num_frames
).to(device, )
additional_model_inputs["num_video_frames"] = batch["num_video_frames"]
for k in additional_model_inputs:
if isinstance(additional_model_inputs[k], torch.Tensor):
additional_model_inputs[k] = additional_model_inputs[k].to(dtype=torch.float16)
def denoiser(input, sigma, c):
return model.denoiser(
model.model, input, sigma, c, **additional_model_inputs
)
samples_z = model.sampler(denoiser, randn, cond=c, uc=uc)
samples_z.to(dtype=model.first_stage_model.dtype)
##
model.en_and_decode_n_samples_a_time = decoding_t
model.first_stage_model.to(device)
samples_x = model.decode_first_stage(samples_z)
samples = torch.clamp((samples_x + 1.0) / 2.0, min=0.0, max=1.0)
model.first_stage_model.cpu()
torch.cuda.empty_cache()
os.makedirs(output_folder, exist_ok=True)
base_count = len(glob(os.path.join(output_folder, "*.mp4")))
video_path = os.path.join(output_folder, f"{base_count:06d}.mp4")
writer = cv2.VideoWriter(
video_path,
cv2.VideoWriter_fourcc(*"MP4V"),
fps_id + 1,
(samples.shape[-1], samples.shape[-2]),
)
samples = embed_watermark(samples)
if not skip_filter:
samples = filter(samples)
else:
print("WARNING: You have disabled the NSFW/Watermark filter. Please do not expose unfiltered results in services or applications open to the public.")
vid = (
(rearrange(samples, "t c h w -> t h w c") * 255)
.cpu()
.numpy()
.astype(np.uint8)
)
for frame in vid:
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
writer.write(frame)
writer.release()
all_out_paths.append(video_path)
return all_out_paths
If I try
writer = cv2.VideoWriter(
video_path,
cv2.VideoWriter_fourcc(*"h264"),
fps_id + 1,
(samples.shape[-1], samples.shape[-2]),
)
I get some errors and no playback. If I use *MP4V
I get some warnings but the video is generated:
Sampling with EulerEDMSampler for 31 steps: 0%| | 0/31 [00:00<?, ?it/s]/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Sampling with EulerEDMSampler for 31 steps: 97%|█████████▋| 30/31 [02:24<00:04, 4.82s/it]
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/gradio/components/video.py:274: UserWarning: Video does not have browser-compatible container or codec. Converting to mp4
warnings.warn(
but no playback though.
Describe the bug
Hi guys ! I am using Gradio to deploy some object detection custom applications (YOLOV5). After process the video content I made a video in OpenCv with the bouding boxes of the object detections. But the returned video shows as the image below and I play it. But, if I play the
Here is how I am writiting the video in OpenCv:
Here is where I call gradio components:
Moreover, I tried to use the gradio video example in the components documentation and I had the same issue. I noticed that this problem always happens with videos generated by opencv framework. It is a codec problem ? Could you tell me if I missed anything ?
Thanks a lot
Is there an existing issue for this?
Reproduction
Problem with all opencv generated videos
Screenshot
No response
Logs
System Info
Severity
blocking all usage of gradio