rinongal / textual_inversion

MIT License
2.92k stars 277 forks source link

conditioning_gs-xxx.jpg are empty when running main.py #169

Open J-1nn opened 1 week ago

J-1nn commented 1 week ago

Deer author: @rinongal

Thank you for your outstanding work!

I have a question in the training stage:

My training script:

python ~/textual_inversion-main/main.py \
    --base ~/textual_inversion-main/configs/latent-diffusion/txt2img-1p4B-finetune.yaml \
    -t \
    --actual_resume ~/textual_inversion-main/models/ldm/text2img-large/model.ckpt \
    -n 1010_1_bird_3000 \
    --gpus 1, \
    --data_root ~/textual_inversion-main/data/thin_bird \
    --init_word bird \
    --no-test False
**(max_step = 3000)**

After running main.py, I got several images in ~/logs/thin_bird2024-10-10T10-20-04_1010_1_bird_3000/images/train. As thin_bird for example, the conditioning_gs-xxx.jpg are all empty like this:

image

Other images are as follow:

inputs_gs-000500_e-000005_b-000000.jpg

image

reconstruction_gs-000500_e-000005_b-000000.jpg

image

samples_gs-000500_e-000005_b-000000.jpg

image

samples_scaled_gs-000500_e-000005_b-000000.jpg

image

Are the results normal?

rinongal commented 1 week ago

The conditioning image outputs are just your prompts, and they seem fine. It's been a few years so I don't remember what you'd expect the bird output to look like at 500 steps, but your inputs and reconstructions seem fine.

You should start seeing better outputs from the samples_scaled_gs branch as you get closer to the full 5000 steps. The non-scaled outputs are generally meaningless.

J-1nn commented 1 week ago

The conditioning image outputs are just your prompts, and they seem fine. It's been a few years so I don't remember what you'd expect the bird output to look like at 500 steps, but your inputs and reconstructions seem fine.

You should start seeing better outputs from the samples_scaled_gs branch as you get closer to the full 5000 steps. The non-scaled outputs are generally meaningless.

@rinongal

Thank you for your reply!

I have another question:Why do the images I generate not retain the features of the input?

training script is as follows(set max_step as 5100):

python ~/textual_inversion-main/main.py \
    --base ~/textual_inversion-main/configs/latent-diffusion/txt2img-1p4B-finetune.yaml \
    -t \
    --actual_resume ~/textual_inversion-main/models/ldm/text2img-large/model.ckpt \
    -n 1010_1_bird_3000 \
    --gpus 1, \
    --data_root ~/textual_inversion-main/data/thin_bird \
    --init_word bird \
    --no-test False

interfering script is as follows:

python ~/textual_inversion-main/scripts/txt2img.py \
    --ddim_eta 0.0 \
    --n_samples 8 \
    --n_iter 2 \
    --scale 10.0 \
    --ddim_steps 50 \
    --embedding_path /~/textual_inversion-main/scripts/logs/cat_statue2024-10-10T14-53-37_1010_2_cat_5100/checkpoints/embeddings_gs-2499.pt \
    --ckpt_path  ~/textual_inversion-main/models/ldm/text2img-large/model.ckpt \
    --outdir outputs/cat/txt2img_p4_2499_5100 \
    --prompt "Painting of a * riding a dragon"\

inputs:

image

outputs: embedding_2499(the best in my opinion)

image

embedding_5099(the final step in my opinion)

image

Looking forward to your reply!!! :) :) :)

rinongal commented 1 week ago

The strength of the embedding unfortunately depends a lot on some RNG choices.

I uploaded our cat toy embedding in response to one of the issues, so you can play with that one. Alternatively you can try to train a few more with different seeds and you'll probably get one or two that works better for this prompt.

You can also do things like increase the learning rate or the number of tokens used to represent the concept.

J-1nn commented 1 week ago

@rinongal

Did you train this embedding directly using the configuration in the repo? Did you change any parameters? Your results are excellent! But mine are so bad... QAQ