rinongal / textual_inversion

MIT License
2.87k stars 278 forks source link

improving fine-tuning quality #126

Closed harshaUwm163 closed 1 year ago

harshaUwm163 commented 1 year ago

Hello, Thank you for this great project and the code! I have a small set of birds images that I want to fine-tune on, They look like this:

I am using '*' as my placeholder string. After 6100 training iterations, the images generated look something like this:

These are pretty good, but is there a way to improve the quality of the final images? I am using the large ldm model and all the default parameters.

Thanks!

rinongal commented 1 year ago

That depends on what quality improvement you're looking for. If you're talking about better resolutions and shapes, you might have better results with Stable Diffusion, but editability suffers there.

If you want it to capture finer details and you don't care about editing, you can just bump up the number of learned vectors (num_vectors_per_word) or increase the learning rate.

harshaUwm163 commented 1 year ago

Thank you @rinongal for the detailed explanation.

What do you mean by "editability" with Stable Diffusion here? (Sorry if it is obvious, but I am new to Diffusion Models).

rinongal commented 1 year ago

By editability I mean how well you can change the object with your prompts. If you train your concept too strongly, then things like "a photo of " would give your birds, but things like "a painting of in the style of Van Gogh" will still give photo-realistic images, and not paintings in the style of Van Gogh.

If all you care about is generating more of your subjects, then you don't care about their editability.

harshaUwm163 commented 1 year ago

Gotcha! Thank you for the reply! I am closing this issue now!