modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!
Apache License 2.0
6.56k stars 594 forks source link

如何处理16:9视频以及长视频 #22

Closed Attect closed 7 months ago

Attect commented 7 months ago

如果设置的比例为16:9,则会发生异常


100%|█████████████████████████████████████████████████████████████████████████████████| 300/300 [00:01<00:00, 288.29it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 300/300 [00:16<00:00, 17.91it/s]
  0%|                                                                                             | 0/10 [00:02<?, ?it/s]
2024-03-19 16:17:44.901 Uncaught app exception
Traceback (most recent call last):
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
    exec(code, module.__dict__)
  File "/mnt/e/DiffSynth-Studio/examples/diffutoon_toon_shading.py", line 94, in <module>
    runner.run(config)
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 349, in run
    output_video = self.synthesize_video(model_manager, pipe, config["pipeline"]["seed"], smoother, **config["pipeline"]["pipeline_inputs"])
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 299, in synthesize_video
    output_video = pipe(**pipeline_inputs, smoother=smoother)
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 221, in __call__
    noise_pred_posi = lets_dance_with_long_video(
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/stable_diffusion_video.py", line 38, in lets_dance_with_long_video
    hidden_states_batch = lets_dance(
  File "/mnt/e/DiffSynth-Studio/diffsynth/pipelines/dancer.py", line 72, in lets_dance
    hidden_states, time_emb, text_emb, res_stack = block(hidden_states, time_emb, text_emb, res_stack)
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/attect/anaconda3/envs/DiffSynthStudio/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/e/DiffSynth-Studio/diffsynth/models/sd_unet.py", line 222, in forward
    hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 24 but got size 23 for tensor number 1 in the list.

如果帧数大于1000,经测试好像会把所有帧解码为图像存在内存中,128GB内存也顶不住,能否改为按需解码或者全部处理为图片储存到磁盘上,使用时载入


100%|██████████████████████████████████████████████████████████████████████████████████| 900/900 [00:09<00:00, 92.59it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 900/900 [01:50<00:00,  8.18it/s]
Killed
Artiprocher commented 7 months ago

如果帧数过多,建议在外部分段。算法内部每次迭代都需要所有帧的信息,如果压到硬盘上,读写速度太慢,会导致数倍的计算时间。

Artiprocher commented 7 months ago

关于分辨率,只支持 64 的倍数。

Attect commented 7 months ago

如果帧数过多,建议在外部分段。算法内部每次迭代都需要所有帧的信息,如果压到硬盘上,读写速度太慢,会导致数倍的计算时间。

我做了一些尝试,将裁剪后的图片,以及controlnet.process_image的结果全部压到硬盘了,使用部分改为从硬盘读取 试了几秒的内容看起来没问题 目前正在跑一个3666帧的1280*768的视频 内存使用从原来的一百多GB后Killed(我电脑128GB内存),变成了跑1个小时了内存使用仍未超过5GB(没有修改为实时输出结果,现在应该是处理结果仍在内存里,否则内存使用应该控制在2GB以下) 观察上看,磁盘速度影响微乎其微,因为显卡太慢了(3090),磁盘那零点几秒的读取时间占比非常小(本来就20多小时,多一两个小时也无所谓了)。 还是能跑才是最关键。因为有的视频是一镜到底的,分割会导致前后不统一。

我对python代码不熟,也是借助AI帮改的逻辑 即使可以用应该也不会提交 但我会把改动的代码贴在这下面,有同样需求的人可以一起讨论 需要等待我现在这个跑完看看效果。

Attect commented 7 months ago

我改完了,代码在fork出来的仓库里:DiffSynth-Studio-Disk-Cache 相关测试信息也在里面。 改动添加的逻辑代码写得很不优雅,就不发起pull request了,有兴趣的可以使用,或者评估速度差异。

zhanghongyong123456 commented 7 months ago

我改完了,代码在fork出来的仓库里:DiffSynth-Studio-Disk-Cache 相关测试信息也在里面。 改动添加的逻辑代码写得很不优雅,就不发起pull request了,有兴趣的可以使用,或者评估速度差异。

  1. 我想运行长的视频进行转绘,请问如何设置,我设置的视频路径和分辨率大小,当我运行 sd_toon_shading.py 提示我 有错误,我不确定在哪里修改,希望指点 image
    1. 我测试了一下 1920X1080 一小时只能处理100帧,如何能增加处理视频速度。
Attect commented 7 months ago

我改完了,代码在fork出来的仓库里:DiffSynth-Studio-Disk-Cache 相关测试信息也在里面。 改动添加的逻辑代码写得很不优雅,就不发起pull request了,有兴趣的可以使用,或者评估速度差异。

  1. 我想运行长的视频进行转绘,请问如何设置,我设置的视频路径和分辨率大小,当我运行 sd_toon_shading.py 提示我 有错误,我不确定在哪里修改,希望指点 image
  2. 我测试了一下 1920X1080 一小时只能处理100帧,如何能增加处理视频速度。

我修正了这个错误,你更新一下代码

Attect commented 7 months ago

我改完了,代码在fork出来的仓库里:DiffSynth-Studio-Disk-Cache 相关测试信息也在里面。 改动添加的逻辑代码写得很不优雅,就不发起pull request了,有兴趣的可以使用,或者评估速度差异。

  1. 我想运行长的视频进行转绘,请问如何设置,我设置的视频路径和分辨率大小,当我运行 sd_toon_shading.py 提示我 有错误,我不确定在哪里修改,希望指点 image
  2. 我测试了一下 1920X1080 一小时只能处理100帧,如何能增加处理视频速度。

速度方面确实不快,3090大概也是这个速度,我正在尝试处理一个1600x896的5441帧的,大概要35小时完成sd步骤,smoother还不知道要多少个小时 磁盘缓存带来的速度影响可以更换固态,或者使用一些磁盘加速的程序解决 图像计算方面感觉就是需要这么复杂的处理,因为每一帧都要经过很多帧的一起计算得出

Attect commented 7 months ago

我改完了,代码在fork出来的仓库里:DiffSynth-Studio-Disk-Cache 相关测试信息也在里面。 改动添加的逻辑代码写得很不优雅,就不发起pull request了,有兴趣的可以使用,或者评估速度差异。

  1. 我想运行长的视频进行转绘,请问如何设置,我设置的视频路径和分辨率大小,当我运行 sd_toon_shading.py 提示我 有错误,我不确定在哪里修改,希望指点 image
  2. 我测试了一下 1920X1080 一小时只能处理100帧,如何能增加处理视频速度。

感觉上稍加修改可以将步骤进行多卡并行处理,甚至CPU协助部分计算,比如smoother的左右参考是独立的两次计算,sd步骤里也是分批次取多帧处理的 这个就给原作者来解决吧~

zhanghongyong123456 commented 7 months ago

我改完了,代码在fork出来的仓库里:DiffSynth-Studio-Disk-Cache 相关测试信息也在里面。 改动添加的逻辑代码写得很不优雅,就不发起pull request了,有兴趣的可以使用,或者评估速度差异。

  1. 我想运行长的视频进行转绘,请问如何设置,我设置的视频路径和分辨率大小,当我运行 sd_toon_shading.py 提示我 有错误,我不确定在哪里修改,希望指点 image
  2. 我测试了一下 1920X1080 一小时只能处理100帧,如何能增加处理视频速度。
  1. 我在代码video = VideoData(image_cache_folder=image_cache_folder)增加了 缓冲文件,解决了上面问题,但是又出现了这个问题: image
  2. 能说明一下这些参数都需要在哪里指定吗?

image

Attect commented 7 months ago

错误来自我疏忽了单组contronelet_frame的处理,已经提交了修正代码,更新一下试试,但还没验证(我电脑正在跑一个处理),如果有问题继续提出。

clear_output_folder的位置我是按照examples/diffutoon_toon_shading.py作为范例写的。

Attect commented 7 months ago

贴一个我跑的:

from diffsynth import SDVideoPipelineRunner

# Download models
# `models/stable_diffusion/aingdiffusion_v12.safetensors`: [link](https://civitai.com/api/download/models/229575)
# `models/AnimateDiff/mm_sd_v15_v2.ckpt`: [link](https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt)
# `models/ControlNet/control_v11p_sd15_lineart.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth)
# `models/ControlNet/control_v11f1e_sd15_tile.pth`: [link](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth)
# `models/Annotators/sk_model.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model.pth)
# `models/Annotators/sk_model2.pth`: [link](https://huggingface.co/lllyasviel/Annotators/resolve/main/sk_model2.pth)
# `models/textual_inversion/verybadimagenegative_v1.3.pt`: [link](https://civitai.com/api/download/models/25820?type=Model&format=PickleTensor&size=full&fp=fp16)

# The original video in the example is https://www.bilibili.com/video/BV1iG411a7sQ/.
startFrame = 0
endFrame = 5440
width = 1600
height = 896
config = {
    "models": {
        "model_list": [
            "models/stable_diffusion/aingdiffusion_v12.safetensors",
            "models/AnimateDiff/mm_sd_v15_v2.ckpt",
            "models/ControlNet/control_v11f1e_sd15_tile.pth",
            "models/ControlNet/control_v11p_sd15_lineart.pth"
        ],
        "textual_inversion_folder": "models/textual_inversion",
        "device": "cuda",
        "lora_alphas": [],
        "controlnet_units": [
            {
                "processor_id": "tile",
                "model_path": "models/ControlNet/control_v11f1e_sd15_tile.pth",
                "scale": 0.5
            },
            {
                "processor_id": "lineart",
                "model_path": "models/ControlNet/control_v11p_sd15_lineart.pth",
                "scale": 0.5
            }
        ]
    },
    "data": {
        "input_frames": {
            "video_file": "F:/大喜.mp4",
            "image_folder": None,
            "height": height,
            "width": width,
            "start_frame_id": startFrame,
            "end_frame_id": endFrame
        },
        "controlnet_frames": [
            {
                "video_file": "data/examples/diffutoon/input_video.mp4",
                "image_folder": None,
                "height": height,
                "width": width,
                "start_frame_id": startFrame,
                "end_frame_id": endFrame
            },
            {
                "video_file": "data/examples/diffutoon/input_video.mp4",
                "image_folder": None,
                "height": height,
                "width": width,
                "start_frame_id": startFrame,
                "end_frame_id": endFrame
            }
        ],
        "clear_output_folder": False,
        "output_folder": "F:/output-%d-%d" % (startFrame, endFrame),
        "fps": 30
    },
    "smoother_configs": [
        {
            "processor_type": "FastBlend",
            "config": {}
        }
    ],
    "pipeline": {
        "seed": 0,
        "pipeline_inputs": {
            "prompt": "best quality, perfect anime illustration, a girl standing in front of a display of plates and lamps , a girl wear a pink dress,yousa,fullbody,kitsch movement,dance,china dress,black_hair,1girl",
            "negative_prompt": "verybadimagenegative_v1.3",
            "cfg_scale": 7.0,
            "clip_skip": 2,
            "denoising_strength": 1.0,
            "num_inference_steps": 10,
            "animatediff_batch_size": 16,
            "animatediff_stride": 8,
            "unet_batch_size": 1,
            "controlnet_batch_size": 1,
            "cross_frame_attention": True,
            "smoother_progress_ids": [-1],
            # The following parameters will be overwritten. You don't need to modify them.
            "input_frames": [],
            "num_frames": endFrame+1,
            "width": 1920,
            "height": 1080,
            "controlnet_frames": []
        }
    }
}
print("start:%d end:%d\n" % (startFrame, endFrame))
runner = SDVideoPipelineRunner()
runner.run(config)
print("done at data/examples/diffutoon/output-%d-%d\n" % (startFrame, endFrame))
print("all finish\n")
zhanghongyong123456 commented 7 months ago
ht": height,
            "width": width,

我发现 sd_toon_shading.py 比 diffutoon_toon_shading.py 运行效果好(是因为 多了 "models/RIFE/flownet.pkl"处理吗,我不确定 ),所以我使用的是 sd_toon_shading.py 脚本运行的,会出现我这个错误,我发现你脚本pipeline 管线 图像大小不是 width = 1600 height = 896 是故意这样设置吗 image 我更新了代码,FileNotFoundError: [Errno 2] No such file or directory: 'output\controlnet_caches/cache_p1_0.pt'
image 输出全部是 p0 ,我同时debug 原项目和你修改长视频版本,我发现 controlnet_frames 列表原项目 image image

zhanghongyong123456 commented 7 months ago

我测试了你给出的 diffutoon_toon_shading.py 的demo,是没有问题的,我测试的 sd_toon_shading.py 是有 问题的

zhanghongyong123456 commented 7 months ago

在你给的基础上,添加了一些自动获取视频帧和分辨率(64向上取整)操作,以及输出设置 ,看看有什么需问题没:


from diffsynth import SDVideoPipelineRunner

Download models

models/stable_diffusion/aingdiffusion_v12.safetensors: link

models/AnimateDiff/mm_sd_v15_v2.ckpt: link

models/ControlNet/control_v11p_sd15_lineart.pth: link

models/ControlNet/control_v11f1e_sd15_tile.pth: link

models/Annotators/sk_model.pth: link

models/Annotators/sk_model2.pth: link

models/textual_inversion/verybadimagenegative_v1.3.pt: link

The original video in the example is https://www.bilibili.com/video/BV1iG411a7sQ/.

import os,cv2,math from datetime import datetime

def get_video_info(video_path):

检查文件路径是否存在

if not os.path.exists(video_path):
    return "视频路径错误,请检查文件路径。"

# 打开视频文件
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
    return "视频无法打开,请检查视频文件是否损坏。"

# 获取视频的帧数
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
# 获取分辨率
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
fps = cap.get(cv2.CAP_PROP_FPS)
# 判断分辨率
if width >= 1920 or height >= 1920:
    print("视频分辨率太大,处理时间太久,请适当降低视频分辨率(注:1920分辨率1小时大概处理100帧图)。")
#  如果不是64的倍数则向上取整到最近的64倍数
width_64 = math.ceil(width / 64) * 64
height_64 = math.ceil(height / 64) * 64
print(f"Total frames: {frame_count}")
print(f"Adjusted video resolution: {int(width_64)} x {int(height_64)}")
print(f"The FPS of the video is: {fps}")
# 释放VideoCapture对象
cap.release()
return frame_count, width_64, height_64, fps

prompt = "这里写视频风格转换提示语(请写英语)"

prompt= "best quality, perfect anime illustration, a girl standing in front of a display of plates and lamps , a girl wear a pink dress,yousa,fullbody,kitsch movement,dance,china dress,black_hair,1girl"

设置视频文件路径

video_file = "data/1_2.mp4"

设置输出根路径

output_folder_base = "output"

获取视频文件的帧数、宽高、 FPS

endFrame, width, height, fps = get_video_info(video_file) ################################## 如果需要 手动设置区 视频长宽、视频处理帧和fps ##################################

手动设置初始帧

startFrame = 0

endFrame = 10

width = 960

height = 576

fps = 24

################################## 如果需要 手动设置区 视频长宽、视频处理帧和fps ##################################

获取视频文件名和扩展名

file_name_with_extension = os.path.basename(video_file)

获取视频分割文件名和扩展名

file_name, file_extension = os.path.splitext(file_name_with_extension)

获取当前时间

now = datetime.now()

格式化时间为字符串,例如: '20240413_161408'

timestamp = now.strftime('%Y%m%d_%H%M%S')

设置输出文件名格式为: 视频文件名/初始帧_结束帧_当下时间 ,保证唯一ID optput/1_2/0_10_20240413_161812

file_name_abs =os.path.join(filename,f"{startFrame}{endFrame}_{timestamp}")

output_folder_path = os.path.join(output_folder_base,file_name_abs) config = { "models": { "model_list": [ "models/stable_diffusion/aingdiffusion_v12.safetensors", "models/AnimateDiff/mm_sd_v15_v2.ckpt", "models/ControlNet/control_v11f1e_sd15_tile.pth", "models/ControlNet/control_v11p_sd15_lineart.pth" ], "textual_inversion_folder": "models/textual_inversion", "device": "cuda", "lora_alphas": [], "controlnet_units": [ { "processor_id": "tile", "model_path": "models/ControlNet/control_v11f1e_sd15_tile.pth", "scale": 0.5 }, { "processor_id": "lineart", "model_path": "models/ControlNet/control_v11p_sd15_lineart.pth", "scale": 0.5 } ] }, "data": { "input_frames": { "video_file": video_file, "image_folder": None, "height": height, "width": width, "start_frame_id": startFrame, "end_frame_id": endFrame }, "controlnet_frames": [ { "video_file": video_file, "image_folder": None, "height": height, "width": width, "start_frame_id": startFrame, "end_frame_id": endFrame }, { "video_file": video_file, "image_folder": None, "height": height, "width": width, "start_frame_id": startFrame, "end_frame_id": endFrame } ], "clear_output_folder": False, "output_folder": output_folder_path, "fps": fps }, "smoother_configs": [ { "processor_type": "FastBlend", "config": {} } ], "pipeline": { "seed": 0, "pipeline_inputs": { "prompt": prompt, "negative_prompt": "verybadimagenegative_v1.3", "cfg_scale": 7.0, "clip_skip": 2, "denoising_strength": 1.0, "num_inference_steps": 10, "animatediff_batch_size": 16, "animatediff_stride": 8, "unet_batch_size": 1, "controlnet_batch_size": 1, "cross_frame_attention": True, "smoother_progress_ids": [-1],

The following parameters will be overwritten. You don't need to modify them.

        "input_frames": [],
        "num_frames": endFrame+1,
        "width": height,
        "height": width,
        "controlnet_frames": []
    }
}

} print("start:%d end:%d\n" % (startFrame, endFrame)) runner = SDVideoPipelineRunner() runner.run(config) print(f"done at {output_folder_path}\n") print("all finish\n")

Attect commented 7 months ago

我之后仔细调一下吧,写死p0是因为只有一组,后面却又取了p1,是controlnet_processor_count取值计算错误,你可以试试把这个参数固定为1 具体的修改的话,我电脑在跑任务没办法再跑一遍测试,等我周一之后再看看吧,我也没跑过sd_toon_shading.py,我跑过直接文本转视频的,效果的话我也跑一次对比一下。

我取那个分辨率是因为它是宽高比最接近16:9且宽高都能被64整除的。