Closed Any-Winter-4079 closed 2 years ago
I'd love to help. I was able to train a model on hugginface with a .bin as a result. But I never got it to load into dream.py. I also trained a few models using this repository but the results also didn't load into dream.py I'd love to beta-test any code you through at me. :)
You can load it like this
python3 ./scripts/dream.py --embedding_path model.bin
with model.bin
saved inside the stable_diffusion
folder. For example, here is the Ugly Sonic bin file https://huggingface.co/sd-concepts-library/ugly-sonic/tree/main
Not sure why it's not working for you. Does it work with the bin file from Ugly Sonic?
I'd be interested to see what results you get. For example, if you run, e.g
a photo of * in New York
a photo of * on the beach
etc.
Also, if capable of reproducing what it learnt (*) plus context (e.g. specific background), it'd be interesting to see the number of epochs you trained for, the val/loss_simple_ema
(if you keep it) you got as you trained, etc.
About code, these are our changes vs. development branch https://github.com/lstein/stable-diffusion/compare/development...tmm1:dev-train-m1
With that code we are able to run on M1 + load the resulting file (.pt) into dream.py
Maybe you could use that code/those changes to re-train to see if now it loads?
I'm using the original Hugginface Colab to do the number crunching. Import into dream.py works for the .bin files created. I've focused on new styles, trying out some that I've trained and some of those trained by others in the Hugginface library. What I've found so far. The new styles are very strong and work well on the subjects they were trained on. If I train on portraits, the generated new portraits have a fairly good style transfer. When I try to generate other subjects I get a mixed bag of results. My styles trained on portraits will try to include a person if it is at all possible. I ask for an image of a hamburger, I get a person eating a hamburger. Negative prompts do not seem to have an effect. If the training images are just the slightest bit NSFW, pretty much all of the generated images will not pass the NSFW filter if it is enabeled. Even if I use a totally safe prompt.
So for styles it seems that they work best on the subjects they were trained on. More so than the general model. The general model has less problems of generating a hamburger in any style.
This is a request to add more documentation / create a guide with useful tips for the Textual Inversion process, by people who have successfully got it to work. For context, this is the current guide for Textual Inversion.
We have finally been able to train on M1, and so far we have mixed (initial) results.
Among other things, we seem to be losing context, e.g. after training on 3 burgers
"a close-up photo of * in the style of Van Gogh" -s15 -W512 -H512 -C7.5 -Ak_lms -S1380320324
but if we request something different, like a cat
"a cat in the style of Van Gogh" -s15 -W512 -H512 -C7.5 -Ak_lms -S1380320324
and if we try to mix both
"a close-up photo of a * being eaten by a cat in the style of Van Gogh" -s15 -W512 -H512 -C7.5 -Ak_lms -S1380320324
Some observations:
I have tried some pretrained models, e.g. Ugly-Sonic, and it does seem to keep context better. At low steps, it seems to struggle with bodies/faces:
"<ugly-sonic> in front of the Statue of Liberty in New York, artstation, 8k, high quality" -s10 -W512 -H512 -C7.5 -Ak_lms -S667265387
But using same seed but 10x more steps
"<ugly-sonic> in front of the Statue of Liberty in New York, artstation, 8k, high quality" -s100 -W512 -H512 -C7.5 -Ak_lms -S667265387
Also, if I request something unrelated to what it was trained on, it does produce those images
"a cat in the style of Van Gogh" -s100 -W512 -H512 -C7.5 -Ak_lms -S48976874
The original repo says:
which is not a lot of information.
I think it would be great if we could see metrics from successful attempts, and see
val/loss_simple_ema
values (e.g. what values it reaches, how they progress per Epoch, etc.). Sharing the images folder may also be useful, as it includes train and val images (some of which for me were reconstructions of my training images, others were new burgers, while some were unrelated images and others were black).Some more information can also be added, like could we change
to
'a photo of a {}'
only, for example? And if so would it be quicker to learn? Would it learn at all? Could we reduce the DDIM sampling from 200 steps to say 100 steps? Would it learn? Could we change the sampler? Could we changeval/loss_simple_ema
for other metric?As a summary of some of the questions:
val/loss_simple_ema
) did you use?val/loss_simple_ema
(or chosen metric) values did you get per epoch?personalized.py
in any way?Thanks a lot!