rinongal / textual_inversion

MIT License
2.9k stars 279 forks source link

Input -> Reconstruction fine detail loss shows up in all generated output #70

Closed SeverianVoid closed 1 year ago

SeverianVoid commented 2 years ago

The training works great and the embedding works amazingly as well getting perfect results on the overall structure of the image, However at some stage in the process ultra fine detail in the image is lost and all images generated from the process have the same lost detail.

From the training folder zoomed in on the input vs the reconstruction input_recon

An example of the sample scaled output sample scaled

So even though the overall image has the correct shape the fine details like things that should be circles are lost. I have been trying to go through piece by piece on the finetuning.yaml changing values one at a time and training for a few hours to see what if any parameters I can change in order to solve this but so far nothing has worked. Does anyone have any ideas for what I could change in order to have better fine detail reproduction?

hopibel commented 2 years ago

I don't think there's much you can do about it. SD struggles with intricate fine details in general and the blurry input images aren't helping. Maybe consider upscaling/sharpening them and training on that?

SeverianVoid commented 2 years ago

The input images are not blurry, its blurry in that example since I had scaled it up and zoomed in to give a better image of whats going on. This is what the full size not zoomed in input and reconstruction look like. sharp 1

and here is a full size example output, so its doing a really truly fantastic job for the most part on matching the style and structure. But the fine details match the poor reconstructions and I would really like to try and fix some how. My understanding is that the training images get downscale and compressed as part of the training process before being upscaled again later which would result in the loss of fine detail. samples_scaled_gs-124000_e-000011_b-005200_00001241

ThereforeGames commented 2 years ago

My understanding is that the training images get downscale and compressed as part of the training process before being upscaled again later which would result in the loss of fine detail.

One thing you can try, which is possibly what hopibel was referring to, is including some cropped, zoomed-in images in your training data to hopefully give SD a better idea of what the fine details are supposed to look like.

I think the jury's still out on whether or not this is an effective approach.

Other than that, you might have to rely on complex prompt engineering. Try the usual stuff like, "ultra hd, very detailed, complex, intricate" etc but also look into negative prompts - negating certain terms is almost more powerful than the prompt itself.

rinongal commented 2 years ago

The model does compress and later un-compress the images, but any information lost in that process would be missing from the reconstruction outputs in your log directory as well. If the information exists in the reconstruction image, it should theoretically be possible to recreate it.

With that said, it's going to be very difficult to capture that level of detail in a single embedding vector. You might have more luck if you really increase the number of embedding vectors assigned to the concept, or if you try to fine-tune the model on top of the inversion process (e.g. something like DreamBooth).

Another alternative may be to combine the learned embedding with img2img, or using DDIM inversion (from the guided diffusion paper) to get an initial noise sample that can recreate your exact image and then modifying it a bit from there. These could essentially give you a way to better preserve the fine local details, compared to the word embedding which aims to capture more global semantics.

rinongal commented 1 year ago

Closing due to lack of activity. Feel free to reopen if you need more help.