open-mmlab / StyleShot

StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!
https://styleshot.github.io/
MIT License
270 stars 16 forks source link

About style transferring #27

Open nldhuyen0047 opened 2 months ago

nldhuyen0047 commented 2 months ago

Hi, thanks for your excellent work.

I have a question about style transferring. I would like to transfer style of the target image into the content image, when inference, I test on both Contour and LineArt, but the result is not good.

With my datasets use for the inference, I use a cube with no details on it (I would like to add details for this) for the content image, and a house with details and colors for the target image. With the LineArt, the colors and some details needing to add in the content image is not filled, and with the Contour, the image changed so much and be quite messy.

Could you please give me some solutions for improving the result?

Thank you so much.

Jeoyal commented 2 months ago

Hi @nldhuyen0047, thank you for your interest in our work. I noticed that you previously mentioned an issue related to model training. Are the inference results based your own training model? Also, could you post the content image, style image, and the generated results?

zhihongz commented 2 months ago

I met the same problem. For example, when the content image and the style image are the same, the output is supposed to be the same image as well (some works introduces identity loss to guarantee this). But with StyleShot, the output differs greatly with the input image.

Jeoyal commented 2 months ago

Hi @zhihongz , thank you for your interest in our work. I randomly test some cases that have same content and style images, the outputs are the same image as well, here are the samples: img_v3_02ef_7ab89337-7913-42ae-9b8f-ce7ae078a5cg img_v3_02ef_ed7bb4c0-ecf2-4f9b-bffc-52662d5a262g img_v3_02ef_2ee11bb8-e1aa-40f4-b6c3-00273aed698g

I think your difference comes from the resolution of inputs. Style image will be center crop in 512*512, while content image will not be processed, which might lead to difference.

zhihongz commented 2 months ago

Emm, the output looks similiar to the input, but there are obvious differences. For example, the color of the cat in your last screen shot changes to blue.

zhihongz commented 2 months ago

When you test with some natural images, the differences become even larger. You can exam this pic:

1725623619071_d

Jeoyal commented 2 months ago

In StyleShot, styles are integrated into style embeddings by a specially designed style-aware encoder and then incorporated into the diffusion model through a cross-attention module. This highly aggregative style information is not suitable for pixel-level reconstruction.

zhihongz commented 2 months ago

Got it, thanks for your patient and quick reply.

nldhuyen0047 commented 2 months ago

In StyleShot, styles are integrated into style embeddings by a specially designed style-aware encoder and then incorporated into the diffusion model through a cross-attention module. This highly aggregative style information is not suitable for pixel-level reconstruction.

Sorry for my late reply.

I have encountered some problems with the training model.

I got it. Thank you very much.