Hi,
I already installed the following packages with the guide of "Installation"
pytorch-lightning 1.6.5
colossalai 0.1.11rc3+torch1.11cu11.4
But when I run training, it reports following error:
AttributeError: module 'pytorch_lightning.strategies' has no attribute 'ColossalAIStrategy'
The yaml I used is as following:
`model:
base_learning_rate: 1.0e-04
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: txt
image_size: 64
channels: 4
cond_stage_trainable: false # Note: different from the one we trained before
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: False
Hi, I already installed the following packages with the guide of "Installation" pytorch-lightning 1.6.5 colossalai 0.1.11rc3+torch1.11cu11.4
But when I run training, it reports following error: AttributeError: module 'pytorch_lightning.strategies' has no attribute 'ColossalAIStrategy'
The yaml I used is as following: `model: base_learning_rate: 1.0e-04 target: ldm.models.diffusion.ddpm.LatentDiffusion params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image cond_stage_key: txt image_size: 64 channels: 4 cond_stage_trainable: false # Note: different from the one we trained before conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: False
data: target: main.DataModuleFromConfig params: batch_size: 4 num_workers: 4 train: target: ldm.data.cifar10.hf_dataset params: name: cifar10 image_transforms:
lightning: trainer: accelerator: 'gpu' devices: 1 log_gpu_memory: all max_epochs: 2 precision: 16 auto_select_gpus: False strategy: target: pytorch_lightning.strategies.ColossalAIStrategy params: use_chunk: False enable_distributed_storage: True placement_policy: cuda force_outputs_fp32: False
logger_config: wandb: target: pytorch_lightning.loggers.WandbLogger params: name: nowname save_dir: "/home/tina/models/Diffusion/log/" offline: opt.debug id: nowname`
Do you have any idea about this error? Thanks a lot.