Open J-1nn opened 1 week ago
The conditioning image outputs are just your prompts, and they seem fine. It's been a few years so I don't remember what you'd expect the bird output to look like at 500 steps, but your inputs and reconstructions seem fine.
You should start seeing better outputs from the samples_scaled_gs branch as you get closer to the full 5000 steps. The non-scaled outputs are generally meaningless.
The conditioning image outputs are just your prompts, and they seem fine. It's been a few years so I don't remember what you'd expect the bird output to look like at 500 steps, but your inputs and reconstructions seem fine.
You should start seeing better outputs from the samples_scaled_gs branch as you get closer to the full 5000 steps. The non-scaled outputs are generally meaningless.
@rinongal
Thank you for your reply!
I have another question:Why do the images I generate not retain the features of the input?
training script is as follows(set max_step as 5100):
python ~/textual_inversion-main/main.py \
--base ~/textual_inversion-main/configs/latent-diffusion/txt2img-1p4B-finetune.yaml \
-t \
--actual_resume ~/textual_inversion-main/models/ldm/text2img-large/model.ckpt \
-n 1010_1_bird_3000 \
--gpus 1, \
--data_root ~/textual_inversion-main/data/thin_bird \
--init_word bird \
--no-test False
interfering script is as follows:
python ~/textual_inversion-main/scripts/txt2img.py \
--ddim_eta 0.0 \
--n_samples 8 \
--n_iter 2 \
--scale 10.0 \
--ddim_steps 50 \
--embedding_path /~/textual_inversion-main/scripts/logs/cat_statue2024-10-10T14-53-37_1010_2_cat_5100/checkpoints/embeddings_gs-2499.pt \
--ckpt_path ~/textual_inversion-main/models/ldm/text2img-large/model.ckpt \
--outdir outputs/cat/txt2img_p4_2499_5100 \
--prompt "Painting of a * riding a dragon"\
inputs:
outputs: embedding_2499(the best in my opinion)
embedding_5099(the final step in my opinion)
Looking forward to your reply!!! :) :) :)
The strength of the embedding unfortunately depends a lot on some RNG choices.
I uploaded our cat toy embedding in response to one of the issues, so you can play with that one. Alternatively you can try to train a few more with different seeds and you'll probably get one or two that works better for this prompt.
You can also do things like increase the learning rate or the number of tokens used to represent the concept.
@rinongal
Did you train this embedding directly using the configuration in the repo? Did you change any parameters? Your results are excellent! But mine are so bad... QAQ
Deer author: @rinongal
Thank you for your outstanding work!
I have a question in the training stage:
My training script:
After running main.py, I got several images in ~/logs/thin_bird2024-10-10T10-20-04_1010_1_bird_3000/images/train. As thin_bird for example, the conditioning_gs-xxx.jpg are all empty like this:
Other images are as follow:
inputs_gs-000500_e-000005_b-000000.jpg
reconstruction_gs-000500_e-000005_b-000000.jpg
samples_gs-000500_e-000005_b-000000.jpg
samples_scaled_gs-000500_e-000005_b-000000.jpg
Are the results normal?