hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.76k stars 2.11k forks source link

batch size prob #464

Closed handsomeZhuang closed 3 months ago

handsomeZhuang commented 3 months ago

你好, 请问一下,open-sora1.2中,stage1的WebVid-10M的数据训练了30Ksteps,2epochs,我计算出batchsize是约666,这个数值对吗?非常感谢!

zhengzangw commented 3 months ago

我们用的动态的 batch size。你的计算结果大差不差,但实际 batch size 可以再 configs/opensora-v1-2/train/stage1.py 中找到。

bucket_config = {  # 12s/it
    "144p": {1: (1.0, 475), 51: (1.0, 51), 102: ((1.0, 0.33), 27), 204: ((1.0, 0.1), 13), 408: ((1.0, 0.1), 6)},
    # ---
    "256": {1: (0.4, 297), 51: (0.5, 20), 102: ((0.5, 0.33), 10), 204: ((0.5, 0.1), 5), 408: ((0.5, 0.1), 2)},
    "240p": {1: (0.3, 297), 51: (0.4, 20), 102: ((0.4, 0.33), 10), 204: ((0.4, 0.1), 5), 408: ((0.4, 0.1), 2)},
    # ---
    "360p": {1: (0.2, 141), 51: (0.15, 8), 102: ((0.15, 0.33), 4), 204: ((0.15, 0.1), 2), 408: ((0.15, 0.1), 1)},
    "512": {1: (0.1, 141)},
    # ---
    "480p": {1: (0.1, 89)},
    # ---
    "720p": {1: (0.05, 36)},
    "1024": {1: (0.05, 36)},
    # ---
    "1080p": {1: (0.1, 5)},
    # ---
    "2048": {1: (0.1, 5)},
}

其中,对于 360p 51 帧 (2s) 视频的 local bs 是 8,如果使用 96 卡训练,bs 是 768。

handsomeZhuang commented 3 months ago

我们用的动态的 batch size。你的计算结果大差不差,但实际 batch size 可以再 configs/opensora-v1-2/train/stage1.py 中找到。

bucket_config = {  # 12s/it
    "144p": {1: (1.0, 475), 51: (1.0, 51), 102: ((1.0, 0.33), 27), 204: ((1.0, 0.1), 13), 408: ((1.0, 0.1), 6)},
    # ---
    "256": {1: (0.4, 297), 51: (0.5, 20), 102: ((0.5, 0.33), 10), 204: ((0.5, 0.1), 5), 408: ((0.5, 0.1), 2)},
    "240p": {1: (0.3, 297), 51: (0.4, 20), 102: ((0.4, 0.33), 10), 204: ((0.4, 0.1), 5), 408: ((0.4, 0.1), 2)},
    # ---
    "360p": {1: (0.2, 141), 51: (0.15, 8), 102: ((0.15, 0.33), 4), 204: ((0.15, 0.1), 2), 408: ((0.15, 0.1), 1)},
    "512": {1: (0.1, 141)},
    # ---
    "480p": {1: (0.1, 89)},
    # ---
    "720p": {1: (0.05, 36)},
    "1024": {1: (0.05, 36)},
    # ---
    "1080p": {1: (0.1, 5)},
    # ---
    "2048": {1: (0.1, 5)},
}

其中,对于 360p 51 帧 (2s) 视频的 local bs 是 8,如果使用 96 卡训练,bs 是 768。

buket_config是针对单卡的配置是吗?也就是单卡或者单节点是用buket_config代替原始的batchsize的?

zhengzangw commented 3 months ago

是的,bucket config 是针对单卡的设置。举个例子,如果你想还原传统的训练 batch size 设置,可以这样:

bucket_config = {"360p": {51: (1.0, 8)}}

这里 51 是 num_frames,8 是 Batch size。这里 有详细的说明。

handsomeZhuang commented 3 months ago

你好, 你们有没有微信群或者其他及时交流的方式,方便的话可以分享个权限吗?我这边有一些零散的问题,可能还需要向大佬请教,邮件即时性有点低

------------------ Original ------------------ From: Zheng Zangwei (Alex Zheng) @.> Date: Wed,Jun 19,2024 5:51 PM To: hpcaitech/Open-Sora @.> Cc: handsomeZhuang @.>, Author @.> Subject: Re: [hpcaitech/Open-Sora] batch size prob (Issue #464)

是的,bucket config 是针对单卡的设置。举个例子,如果你想还原传统的训练 batch size 设置,可以这样: bucket_config = {"360p": {51: (1.0, 8)}}

这里 51 是 num_frames,8 是 Batch size。这里 有详细的说明。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

zhengzangw commented 3 months ago

You can try the Discord in the homepage, I am also there.

syc11-25 commented 3 months ago

"360p": {1: (0.2, 141), 51: (0.15, 8), 102: ((0.15, 0.33), 4), 204: ((0.15, 0.1), 2), 408: ((0.15, 0.1), 1)},请问两个括号里两个概率是代表什么 360p的有1-0.2的概率被压缩,请问压缩比例是固定的吗?

zhengzangw commented 2 months ago

两个括号分别为 (x, y),则含义为: 满足该 bucket 的视频,有 x% 的概率被送到更低分辨率的 bucket 中处理(360p->240p)。如若不然,则有 y% 的概率被送往更低长度的 bucket 中处理。

压缩比例并非固定,而是一个概率值。