miccunifi / ladi-vton

[ACM MM 2023] - LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
Other
412 stars 56 forks source link

Not works well in Textual or Letters #6

Closed husonchen closed 1 year ago

husonchen commented 1 year ago

By running command: python src/inference.py --dataset vitonhd --vitonhd_dataroot zalando-hd-resized --output_dir output --test_order paired --batch_size 1 --mixed_precision fp16 I found it not works well in textual or letters, like those badcases:

image image image image

That phenomenon is not mentioned in your paper,is there any way to fix it ?

phongnhhn92 commented 1 year ago

Hi, I also observe the same issue. This might be the limitation of the current model.

ABaldrati commented 1 year ago

Hi @husonchen Thanks for your interest in our work!

Thank you for bringing up this issue. We have taken note of this limitation in our approach. We argue that the behavior you highlighted results from our model's reliance on Stable Diffusion. Stable Diffusion has limitations, particularly in reproducing readable text or accurately capturing complex high-frequency details.

We believe this limitation could be addressed by using a non-latent diffusion approach. Of course, the "standard" diffusion methods have a higher computational load and require more resources.

Alberto