[BUG]: Inference problems?

rohit901 commented 1 year ago

🐛 Describe the bug

Hi, I tried to train the stable diffusion v-1-4 on custom data and i'm not able to do inference. Getting this bug.

Screen Shot 2022-11-14 at 15 37 31 PM

I have downloaded the weights from: https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/unet/diffusion_pytorch_model.bin

have linked diffusion_pytorch_model.bin for unet and vae accordingly.

running txt2img script for inference

python scripts/txt2img.py \
--prompt='the bird has webbed feet that are pale pink as well as skinny tarsus.' \
--outdir='outputs/generated_birds' \
--H=256 --W=256 \
--n_samples=4 \
--plms \
--config='configs/train_colossalai_birds.yaml' \
--ckpt='out_bird/2022-11-14T13-47-20_train_colossalai_birdstest/checkpoints/last.ckpt'

Using FolderDataset to load my data.

Environment

No response

Fazziekey commented 1 year ago

can you show your train_colossalai_birds.yaml ?

Fazziekey commented 1 year ago

There is no args num_timesteps in yaml but num_timesteps_cond

rohit901 commented 1 year ago

Thank you for your reply. I had just copied the yaml file from the coco example, and it uses num_timesteps_cond, not sure why I'm not able to run the inference script properly, could you please help? Also how can I use diffusers library inference pipeline if I wanted to use that as well? I hope the checkpoint file last.ckpt which is generated automatically is the correct one too? posting the yaml here:

model:
  base_learning_rate: 1.0e-04
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.00085
    linear_end: 0.0120
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    cond_stage_key: txt
    image_size: 64
    channels: 4
    cond_stage_trainable: false   # Note: different from the one we trained before
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False

    scheduler_config: # 10000 warmup steps
      target: ldm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [ 1 ] # NOTE for resuming. use 10000 if starting from scratch
        cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
        f_start: [ 1.e-6 ]
        f_max: [ 1.e-4 ]
        f_min: [ 1.e-10 ]

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 32 # unused
        from_pretrained: 'weights/stable-diffusion-v1-4/unet/diffusion_pytorch_model.bin'
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_heads: 8
        use_spatial_transformer: True
        transformer_depth: 1
        context_dim: 768
        use_checkpoint: False
        legacy: False

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        from_pretrained: 'weights/stable-diffusion-v1-4/vae/diffusion_pytorch_model.bin'
        monitor: val/rec_loss
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
      params:
        use_fp16: True

data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 16
    num_workers: 4
    train:
      target: ldm.data.birds.FolderData
      params:
        root_dir: data/birds/images/
        caption_file: data/birds/captions.json
        image_transforms:
        - target: torchvision.transforms.Resize
          params:
            size: 256
            interpolation: 3
        - target: torchvision.transforms.RandomCrop
          params:
            size: 256
        - target: torchvision.transforms.RandomHorizontalFlip

lightning:
  trainer:
    accelerator: 'gpu' 
    devices: 1
    log_gpu_memory: all
    max_epochs: 5
    precision: 16
    auto_select_gpus: False
    strategy:
      target: pytorch_lightning.strategies.ColossalAIStrategy
      params:
        use_chunk: False
        enable_distributed_storage: True,
        placement_policy: cuda
        force_outputs_fp32: False

    log_every_n_steps: 2
    logger: True
    default_root_dir: "/home/rohit.bharadwaj/Documents/AI701/Project/ColossalAI/examples/images/diffusion/out_bird/tmp/diff_log/"
    profiler: pytorch

  logger_config:
    wandb:
      target: pytorch_lightning.loggers.WandbLogger
      params:
          name: nowname
          save_dir: "/tmp/diff_log/"
          offline: opt.debug
          id: nowname

Fazziekey commented 1 year ago

Thanks for your issue, we are collaborating with huggingface to support diffuser rep, It may take some times

rohit901 commented 1 year ago

I see, but could you please help me with the current inference bug in the given script, training process worked fine and there was a last.ckpt file generated but the inference is giving me errors, let me know if you require more details.

Fazziekey commented 1 year ago

we have fix the inference problem in https://github.com/hpcaitech/ColossalAI/pull/1986

rohit901 commented 1 year ago

Thanks a lot for the follow-up and for linking the PR. In this case once the PR gets merged, I can run the inference properly using my earlier model config and weights right?

hpcaitech / ColossalAI

[BUG]: Inference problems? #1945

🐛 Describe the bug

Environment