xthan / VITON

Code and dataset for paper "VITON: An Image-based Virtual Try-on Network"
523 stars 141 forks source link

questions concerning the parper. #36

Closed ChenDRAG closed 5 years ago

ChenDRAG commented 5 years ago

1.According to your paper,your coarse network takes no rgb information other than the rgb segmentation of the women's hair and head. I'm wondering how does the network manage to 'guess' the color of the pants and the shape and color the the woman's arm ,when no information tells the network that the color of the pants is gray? Is this a sign of overfitting? screenshot from 2019-01-17 14-31-27 screenshot from 2019-01-17 14-33-03 2.If you can already get the segmentation of a person using the LIP, why bother to train the network and get the mask yourself? 3.When I implemented LIP,I found that its segmentation results are not very satisfying, far from being able to create an accurate mask of the clothes or hair of head? Do you have similar problems? Thanks a lot.

xthan commented 5 years ago
  1. The network learns to color the skin regions (arms) based on the face color. The shape of the arms is guided by the body shape representation and the keypoints. As for the pants, as I mentioned in the paper, the network learns co-occurrence of top and bottom and tries to inpaint the bottom regions with the color that most likely to occur.

  2. The LIP gives mask of the original clothing, but I am predicting the mask to be consistent with the desired clothing. For example, the person is wearing a long sleeve shirt and want to try a short sleeve shirt on. The LIP will generate a long sleeve mask, while the network will give a short sleeve one.

  3. Implementing LIP is not easy. I recommend you to run the original pre-trained LIP models instead of implementing it yourself to get good results. Also, for human parsing, the results sometimes strongly depend on the input image size. Thus, you can try to alter the input image size and see what yields the best results.

ChenDRAG commented 5 years ago

Very appreciate your reply.That helps a lot.(For question 2 i'm just being silly) Still,I do not understand that rather than let the network guess the color of the pants and the arm, wouldn't it be better to keep the rgb information of the pants and arm together with face and hair?

xthan commented 5 years ago

Please refer to "Keep the original pants regions" Section in the supplementary materials for the paper. You can keep the regions of arms and pants, but it will cause some inconsistency issue.

ChenDRAG commented 5 years ago

OK.many thanks.