Some "type" Task Questions in LayOutDiffusion

Hello, I want to implement a function that, given several labels, generates coordinates corresponding to the labels to form a reasonable layout. Is this task what you referred to as the "type" method? However, I have a few questions:

Is the training phase generic regardless of the task? It seems that there is no command provided to train the "type" task.
In "type" Task, are input labels not fixed ? I think is the input is the fixed labels type, the same output content at least labels is the same. (Look the end "Data 1 and Data 2 , both have "2 tables and 4 texts"） . I found that during the first few training steps used for testing(5000 steps), the model's output was very chaotic（see Data 3）, including the labels. It was only toward the end that the labels gradually stabilized, but they still did not completely match the test set.The input and output labels are different, which makes me confused. What should I do ? What could be the reason for this?

I look forward to your reply. Thank you very much for your help~

-- Data2 (pretrained generate): ["START table 10 23 115 74 | table 10 77 115 111 | text 10 9 61 18 | text 65 9 115 18 | text 10 21 50 22 | text 10 75 81 76 END PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD"]

-- Data3 (my generate): ["START 29 PAD PAD 121 PAD | 118 35 29 PAD PAD | 104 PAD PAD PAD PAD | table PAD PAD 47 PAD 75 100 PAD PAD PAD PAD | table 25 PAD PAD PAD 15 PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 17 PAD PAD PAD PAD PAD PAD PAD PAD PAD 99 PAD PAD PAD 107 PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD 89 PAD 57 PAD PAD"]

my training command:

python scripts/train.py --checkpoint_path ../results/checkpoint/pub_cond --model_arch transformer --modality e2e-tgt --save_interval 1000 --lr 3e-5 --batch_size 32 --diffusion_steps 200 --noise_schedule gaussian_refine_pow2.5 --use_kl False --learn_sigma False --aux_loss True --rescale_timesteps False --seq_length 121 --num_channels 128 --seed 102 --dropout 0.1 --padding_mode pad --experiment random --lr_anneal_steps 400000 --weight_decay 0.0 --predict_xstart True --training_mode discrete1 --vocab_size 139 --submit False --e2e_train ../data/processed_datasets/PublayNet_ltrb_lex

inference command:

python scripts/batch_decode.py ../results/checkpoint/pub_cond -1.0 ema 20 20 False -1 type

microsoft / LayoutGeneration

Some "type" Task Questions in LayOutDiffusion #43