Open lwb2099 opened 10 months ago
Did you specify the path of the checkpoints?
@JingyeChen Are there any checkpoints of the TextDiffuser-2 models available based on SD 2.1? If not, are there significant modifications to the code required to support the higher-resolution SD model? And, if so, is the training code to support SD 2.1 training released?
I would love to test a checkpoint based on SD 2.1 too. The paper already mention that results basing in SD 2.1 are better.
I meet the similar results, this is my running code, and results as follows: export CUDA_VISIBLE_DEVICES=4 accelerate launch inference_textdiffuser2_t2i_full.py \ --pretrained_model_name_or_path="/home/jovyan/wrj/workspace/project/tools/stable-diffusion-v1-5" \ --mixed_precision="fp16" \ --output_dir="inference_results" \ --enable_xformers_memory_efficient_attention \ --resume_from_checkpoint="/home/jovyan/wrj/workspace/project/unilm/textdiffuser-2/ckpt/JingyeChen22/textdiffuser2-full-ft" \ --granularity=128 \ --max_length=77 \ --coord_mode="ltrb" \ --cfg=7.5 \ --sample_steps=20 \ --seed=43555 \ --m1_model_path="/home/jovyan/wrj/workspace/project/unilm/textdiffuser-2/ckpt/JingyeChen22/textdiffuser2_layout_planner" \ --input_format='prompt' \ --input_prompt='the log for "ABC"' Does it work normally?
I meet the similar results, this is my running code, and results as follows: export CUDA_VISIBLE_DEVICES=4 accelerate launch inference_textdiffuser2_t2i_full.py --pretrained_model_name_or_path="/home/jovyan/wrj/workspace/project/tools/stable-diffusion-v1-5" --mixed_precision="fp16" --output_dir="inference_results" --enable_xformers_memory_efficient_attention --resume_from_checkpoint="/home/jovyan/wrj/workspace/project/unilm/textdiffuser-2/ckpt/JingyeChen22/textdiffuser2-full-ft" --granularity=128 --max_length=77 --coord_mode="ltrb" --cfg=7.5 --sample_steps=20 --seed=43555 --m1_model_path="/home/jovyan/wrj/workspace/project/unilm/textdiffuser-2/ckpt/JingyeChen22/textdiffuser2_layout_planner" --input_format='prompt' --input_prompt='the log for "ABC"' Does it work normally?
And the log is as follows:
(textdiffuser2) jovyan@nb-big-dz-mxfw-1-0:~/wrj/workspace/project/unilm/textdiffuser-2$ bash inference_textdiffuser2_t2i_full.sh
/opt/conda/envs/textdiffuser2/lib/python3.8/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
GPU name: NVIDIA A100 80GB PCIe
Number of GPUs: 1
Namespace(cache_dir=None, cfg=7.5, checkpointing_steps=500, checkpoints_total_limit=5, coord_mode='ltrb', dataloader_num_workers=0, drop_caption=False, enable_xformers_memory_efficient_attention=True, granularity=128, hub_model_id=None, hub_token=None, input_file=None, input_format='prompt', input_prompt='the log for "ABC"', local_rank=-1, logging_dir='logs', m1_model_path='/home/jovyan/wrj/workspace/project/unilm/textdiffuser-2/ckpt/JingyeChen22/textdiffuser2_layout_planner', max_length=77, mixed_precision='fp16', output_dir='inference_results', pretrained_model_name_or_path='/home/jovyan/wrj/workspace/project/tools/stable-diffusion-v1-5', prompts_txt_file=None, push_to_hub=False, report_to='tensorboard', resolution=512, resume_from_checkpoint='/home/jovyan/wrj/workspace/project/unilm/textdiffuser-2/ckpt/JingyeChen22/textdiffuser2-full-ft', revision=None, sample_steps=20, seed=43555, vis_num=16)
/opt/conda/envs/textdiffuser2/lib/python3.8/site-packages/accelerate/accelerator.py:401: UserWarning: log_with=tensorboard
was passed but no supported trackers are currently installed.
warnings.warn(f"log_with={log_with}
was passed but no supported trackers are currently installed.")
Detected kernel version 5.4.160, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
05/31/2024 09:18:32 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
49408 51583
{'scaling_factor', 'force_upcast'} was not found in config. Values will be initialized to default values. {'addition_time_embed_dim', 'reverse_transformer_layers_per_block', 'transformer_layers_per_block', 'dropout', 'attention_type'} was not found in config. Values will be initialized to default values. Resuming from checkpoint textdiffuser2-full-ft 05/31/2024 09:18:46 - INFO - accelerate.accelerator - Loading states from /home/jovyan/wrj/workspace/project/unilm/textdiffuser-2/ckpt/JingyeChen22/textdiffuser2-full-ft 05/31/2024 09:18:46 - INFO - accelerate.checkpointing - All model weights loaded successfully 05/31/2024 09:18:46 - INFO - accelerate.checkpointing - All optimizer states loaded successfully 05/31/2024 09:18:46 - INFO - accelerate.checkpointing - All scheduler states loaded successfully 05/31/2024 09:18:46 - INFO - accelerate.checkpointing - All dataloader sampler states loaded successfully 05/31/2024 09:18:46 - INFO - accelerate.checkpointing - GradScaler state loaded successfully 05/31/2024 09:18:46 - INFO - accelerate.checkpointing - Could not load random states 05/31/2024 09:18:46 - INFO - accelerate.accelerator - Loading in 0 custom states detect existing output_dir, removing the contained jpg/txt files ... rm: cannot remove 'inference_results/.jpg': No such file or directory rm: cannot remove 'inference_results/.txt': No such file or directory Loading checkpoint shards: 100%|██████████████████| 3/3 [00:25<00:00, 8.50s/it] there are 1 samples for generation [Human] Given a prompt that will be used to generate an image, plan the layout of visual text for the image. The size of the image is 128x128. Therefore, all properties of the positions should not exceed 128, including the coordinates of top, left, right, and bottom. All keywords are included in the caption. You dont need to specify the details of font styles. At each line, the format should be keyword left, top, right, bottom. So let us begin. Prompt: the log for "ABC" [Assistant] ABC 22,24,114,79
the number of samples: 1 user_prompt the log for "ABC" current_ocr ['ABC 22,24,114,79', ''] /opt/conda/envs/textdiffuser2/lib/python3.8/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( {'clip_sample_range', 'prediction_type', 'timestep_spacing', 'thresholding', 'variance_type', 'dynamic_thresholding_ratio', 'sample_max_value'} was not found in config. Values will be initialized to default values. 100%|███████████████████████████████████████████| 20/20 [00:07<00:00, 2.76it/s]
Describe Model I am using (Text diffuser-2): I am running inference on text diffuser-2 , the inference code of mine:
CUDA_VISIBLE_DEVICES=6 python inference_textdiffuser2_t2i_full.py \ --pretrained_model_name_or_path="/path/to/stable-diffusion-v1-5" \ --mixed_precision="fp16" \ --enable_xformers_memory_efficient_attention \ --resume_from_checkpoint="/path/to/textdiffuser-2" \ --granularity=128 \ --max_length=77 \ --coord_mode="ltrb" \ --cfg=7.5 \ --sample_steps=20 \ --seed=43555 \ --vis_num 16 \ --m1_model_path="/path/to/layout_planner" \ --input_format='prompt' \ --input_prompt 'A picture of a bruised apple with the text apples are good for you' \ --output_dir="."
and the generated results looks like this: Looks like something is going wrong. Further test on some data from MarioEval: