whaohan / desigen

Official code for paper: Desigen: A Pipeline for Controllable Design Template Generation [CVPR'24]
54 stars 4 forks source link

Background and Layout Evaluation #10

Open aaghawaheed opened 1 month ago

aaghawaheed commented 1 month ago

Am I doing something wrong? I am unable to generate a harmonized layout

Background Training

export MODEL_NAME="CompVis/stable-diffusion-v1-4" export INSTANCE_DIR="./../data/background/train" export OUTPUT_DIR="./../logs/background"

accelerate launch train_background.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --train_text_encoder \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --resolution=512 \ --train_batch_size=1 \ --gradient_checkpointing \ --learning_rate=1e-5 \ --lr_scheduler="constant" \ --lr_warmup_steps=500 \ --num_train_epochs=50 \ --resume_from_checkpoint=latest \ --with_spatial_loss \ --checkpointing_steps=10000

Background Evaluation Results

Layout Training

python main.py --dataset webui --exp layout \ --data_dir ../data \ --epoch 100 --lr 1.5e-5 --lr_decay \ --encode_backbone swin --encode_embd 1024 \ --finetune_vb --pretrain_vb --debug

Layout Evaluation Results

layout_0 layout_1 layout_2 layout_3 layout_4 layout_5

aaghawaheed commented 1 month ago

Also in the following command, I can not see the usage of "generator_path" param

python pipeline.py \ --prompt "Rose Valentines' Day" \ --mode "background" \ --encoder_path /path/to/encoder \ --decoder_path /path/to/decoder \ --generator_path logs/background-ours

whaohan commented 1 month ago

For the layout generation training, remove the --debug option to use the training set, otherwise only valset is used for training.

aaghawaheed commented 1 month ago

Thanks for the response. Let me try

aaghawaheed commented 1 month ago

Evaluation Script

EXP=swin DATASET=webui COMMAND=category_generate python main.py --encode_backbone swin --encode_embd 1024 \ --dataset $DATASET --exp $EXP --evaluate \ --decoder_path ../logs/$DATASET/$EXP/checkpoints/decoder.pth \ --encoder_path ../logs/$DATASET/$EXP/checkpoints/encoder.pth \ --eval_command $COMMAND \ --calculate_harmony \ --save_pkl

python eval.py webui logs/$DATASET/$EXP/generatedlayout$COMMAND.pth

Training Script

python main.py --dataset webui --exp layout \ --data_dir ../data \ --epoch 100 --lr 1.5e-5 --lr_decay \ --encode_backbone swin --encode_embd 1024 \ --finetune_vb --pretrain_vb

  1. Could you please also fix the the --exp folder, as the name is different in both the training and evaluation scripts?
  2. What's the difference between harmony and coverage?
  3. ["category_generate", "real_image", "reconstruction"]—what's's the difference between them?
  4. Where can I find the presentation generation script.

Thank you for your assistance with my questions. I appreciate your support and look forward to your response.

aaghawaheed commented 1 month ago

Training Script

python main.py --dataset webui --exp layout \ --data_dir ../data \ --epoch 100 --lr 1.5e-5 --lr_decay \ --encode_backbone swin --encode_embd 1024 \ --finetune_vb --pretrain_vb

Evaluation Command

EXP=layout DATASET=webui COMMAND=category_generate python main.py --encode_backbone swin --encode_embd 1024 \ --dataset $DATASET --exp $EXP --evaluate \ --decoder_path /logs/$DATASET/$EXP/checkpoints/decoder_99.pth \ --encoder_path /logs/$DATASET/$EXP/checkpoints/encoder_99.pth \ --eval_command $COMMAND \ --calculate_harmony \ --save_pkl \ --save_image

python eval.py webui /logs/$DATASET/$EXP/generatedlayout$COMMAND.pth

Evaluation Results

using device: cuda:0 load iou data train <dataset.WebUI object at 0x7f2806174c40> load iou data train dataset vocab_size: 231 train dataset max_length: 42 encoder status: name-swin, grad-True, pretrain-False /lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392036766/work/aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] testing... dataset length: 393 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 393/393 [01:36<00:00, 4.08it/s] harmony is: 0.17446137576242893 Dataset: webui Alignment: 0.22 Overlap: 17.67

Still the results are not satisfactory, am I doing something wrong?