how to prepare text prompt

Hi @Taited Thanks for your interest in our work!!

As stated in #15:

In Figure 2 of the paper, you can see that the textual prompt q is a simple, predefined prompt like "a photo of a model wearing a dress," "a photo of a model wearing a lower body garment," or "a photo of a model wearing an upper body garment." This prompt serves as a starting point for the diffusion process. It is not tailored to each specific image in the dataset; rather, it provides a general direction for the model to follow during the virtual try-on task. We then use the textual inversion adapter $F_{\theta}$ to predict the pseudo-word embeddings associated with that specific garment. Finally, we condition the denoising network using the features extracted from the concatenation of the generic prompt plus the predicted pseudo-word embeddings.

However, in 2nd row of Table 4, we also provide an experiment without the textual inversion technique but using a textual description of the in-shop garment. You can find the textual description of each garment in the data/noun_chunks folder. To extract these textual descriptions we follow the approach described in https://arxiv.org/abs/2304.02051

Alberto

miccunifi / ladi-vton

how to prepare text prompt #19