hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.76k stars 2.1k forks source link

KeyError 'Height' while training #717

Open Chaima-Doggui opened 11 hours ago

Chaima-Doggui commented 11 hours ago

Hello guys,

I'm trying to train the model using this command: torchrun --standalone --nproc_per_node 1 /path/opensora/Open-Sora/scripts/train.py /path/opensora/my_train_config.py .

but i'm getting this error even though my csv file is in this format : path,id,relpath,num_frames,height,width,aspect_ratio,fps,resolution,text

Traceback (most recent call last):
  File "/home/txt_vid/opensora/Open-Sora/scripts/train.py", line 423, in <module>
    main()
  File "/home/txt_vid/opensora/Open-Sora/scripts/train.py", line 315, in main
    loss_dict = scheduler.training_losses(model, x, model_args, mask=mask)
  File "/home/txt_vid/opensora/Open-Sora/opensora/schedulers/rf/__init__.py", line 103, in training_losses
    return self.scheduler.training_losses(model, x_start, model_kwargs, noise, mask, weights, t)
  File "/home/txt_vid/opensora/Open-Sora/opensora/schedulers/rf/rectified_flow.py", line 89, in training_losses
    t = timestep_transform(t, model_kwargs, scale=self.transform_scale, num_timesteps=self.num_timesteps)
  File "/home/txt_vid/opensora/Open-Sora/opensora/schedulers/rf/rectified_flow.py", line 23, in timestep_transform
    if model_kwargs[key].dtype == torch.float16:
KeyError: 'height'

and this is my config :

# Dataset settings
dataset = dict(
    type="VideoTextDataset",
    data_path="/home/txt_vid/input-dataset/subclips_512/meta/meta_caption_info_512_filtered.csv",
    num_frames=48,
    frame_interval=8,
    image_size=(512,512),
    transform_name="resize_crop",
)

bucket_config = {
    "144": {4: (1.0, 1),48: (1.0, 1),300: (1.0, 1),400: (1.0, 1),500: (1.0, 1),600: (1.0, 1),800: (1.0, 1)},
    "144p": {4: (1.0, 100), 48: (1.0, 30), 102: (1.0, 20), 204: (1.0, 8), 408: (1.0, 4)},
     "256": {4: (0.5, 100), 48: (0.3, 24), 102: (0.3, 12), 204: (0.3, 4), 408: (0.3, 2)},
     "240p": {4: (0.5, 100), 48: (0.3, 24), 102: (0.3, 12), 204: (0.3, 4), 408: (0.3, 2)},
    "360p": {4: (0.5, 60), 48: (0.3, 12), 102: (0.3, 6), 204: (0.3, 2), 408: (0.3, 1)},
    "512": {4: (0.5, 60), 48: (0.3, 12), 102: (0.3, 6), 204: (0.3, 2), 408: (0.3, 1)},
}
grad_checkpoint = True

# AcSceleration settings
num_workers : 0
num_bucket_build_workers : 1
dtype : "fp16"
plugin : "zero2"
reduce_bucket_size_in_m : 10
sp_size = 1
# Model settings

model = dict(
    type="STDiT-XL/2",
    space_scale=0.5,
    time_scale=1.0,
    from_pretrained=None,
    enable_flash_attn=True,
    enable_layernorm_kernel=True,
    enable_sequence_parallelism=False,  # enable sq here

)

vae = dict(
    type= "OpenSoraVAE_V1_2",
    from_pretrained= "hpcai-tech/OpenSora-VAE-v1.2",
    micro_frame_size=1,
    micro_batch_size=1
)
text_encoder = dict(
        type= "clip",
        from_pretrained= "openai/clip-vit-large-patch14",
        model_max_length= 77,
)
scheduler = dict(
           type= "rflow",
        use_timestep_transform= True,
        sample_method="logit-normal"
)

# Log settings
outputs = "/home/txt_vid/opensora/Open-Sora/outputs"
wandb = True
epochs = 1
log_every = 1
ckpt_every = 1

# optimization settings
load = None
grad_clip = 1.0
lr = 5e-5
ema_decay = 0.98
adam_eps = 1e-15
batch_size= 1