Open 1blackbar opened 1 year ago
try this? pip install taming-transformers
should run fine with just the instructions tho.. multiple people have tested it and it worked. did you install stable diffusion before installing this? if not do that..
added some extra things to avoid confusion
got it to run but used someone eleses env with GUi then moved your python scripts to it, I gout kicked out for out of memory, so is this thing needs at least 10 gb even for 2 images ? cant someone otpimize the code like with model ? I might try colab im on gtx 1080ti 11gb but it still flips out.
Oh i can see that it creates 16 (!!!!) working processess , what the heck, is this for 3090 only? Where do i change that ?
OK simply lowering worker num wont work, any clue what to lower next for 11gb card to get biggest improvement ? ill tinker with it in meantime
-- Allright got it working !!! i had to lower workers , batch size to 1 and max_images to 5
@1blackbar can u send me the config file? i'll put it in my repo send me on discord @MATRIXMANE#1219
I added you, in meantime
model: base_learning_rate: 5.0e-03 target: ldm.models.diffusion.ddpm.LatentDiffusion params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image cond_stage_key: caption image_size: 64 channels: 4 cond_stage_trainable: true # Note: different from the one we trained before conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: False embedding_reg_weight: 0.0
personalization_config:
target: ldm.modules.embedding_manager.EmbeddingManager
params:
placeholder_strings: ["*"]
initializer_words: ["sculpture"]
per_image_tokens: false
num_vectors_per_token: 1
progressive_words: False
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32 # unused
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: True
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
data: target: main.DataModuleFromConfig params: batch_size: 1 num_workers: 10 wrap: false train: target: ldm.data.personalized.PersonalizedBase params: size: 512 set: train per_image_tokens: false repeats: 100 validation: target: ldm.data.personalized.PersonalizedBase params: size: 512 set: val per_image_tokens: false repeats: 10
lightning: callbacks: image_logger: target: main.ImageLogger params: batch_frequency: 500 max_images: 15 increase_log_steps: False
trainer: benchmark: True max_steps: 6100
cant put it in <> cause reasons
this is my yaml, im sure it can be modified but i dont think it matters how much batchsize you put in cause your gpu is working on max anyway depending if you throw at it big images or smaller , it will get to the end in the same time
why is that ? i ran everything like in instructions