Open dopu2k16 opened 4 years ago
Hey, did u get any way to measure LPIPS?. I learned that IS cannot be used as proper evaluation metric for this task . We could only do SSIM,IOU as per now.
quantitative evaluation codes are not included in this repository. We used the official codebase for these metrics. Please refer to the metrics' official codebases.
Hello! I encountered many problems with metrics in this field too and I can't reproduce the same score. Did you use skimage library for metrics? Can you provide the code so that I can check if I am using it correctly?
+1
We tried to use the official source codes for the quantitative evaluation as much as possible. Here are the repositories used for evaluation as listed below:
1) IoU: (Implemented in MatLab according to FCN paper) 2) SSIM: https://www.mathworks.com/help/images/ref/ssim.html 3) LPIPS (AlexNet): https://github.com/richzhang/PerceptualSimilarity 4) IS: https://github.com/sundyCoder/IS_MS_SS/blob/master/is/IS.py
I will try those, thank you so much for the help!
Hi, @Gogo693 where is the target human image to calculate SSIM and LPIPS? and where is the real parsed segmentation map to calculate mIoU? Thanks.
Hi, I only tried to calculate SSIM and I used the original person image in the test dataset as target human image, even if it has a different garment. I can't be sure it is the best solution but I don't know how to do it otherwise, maybe Minar can help us. For what concerns LPIPS and mIoU I don't know if I am going to use those (I did not read how they work yet), but I will update if I discover something.
I only tried to calculate SSIM and I used the original person image in the test dataset as target human image, even if it has a different garment. I can't be sure it is the best solution but I don't know how to do it otherwise
@Gogo693 I think this is not the correct way to calculate SSIM and LPIPS, @minar09 can you help us?
Hi, except the inception score, for all of the metrics evaluation, you need to generate test results on the paired setting, which means input pairs with same clothes so that you can use the person image as the ground truth. Please see our CP-VTON+ or CloTH-VTON paper for these details. Different clothed pairs are for visualization comparison only.
@minar09 thanks, where can I find the details for the paired setting?
This paired setting means the same setting as training, same id for both cloth and person of a pair input.
So it is paired for SSIM (or when we need comparison) and unpaired for IS as in shown results. Thank you, I did not understand this was the standard for evaluating try-on but now it is clear!
I know the difference between the two, but still don't know how to evaluate it, if someone can provide more instructions, that would be great.
This is what I understood. When running test you run 2 experiments: 1) You use as input Person P_i matched with Cloth C_i (paired input like in training) to generate G_i_i, then you evaluate SSIM comparing G_i_i with the original image P_i. 2) You use as input Person P_i matched with Cloth C_j (unpaired match i != j) to generate G_i_j, then you evaluate IS on the generated images without comparison as it is not needed by the IS score. For the unpaired matches there should be a text file in the dataset.
I don't know if this was your doubt, but I hope it is useful.
Hi, @minar09 @Gogo693, Sorry for bothering you with this overtasked problem. About the evaluation for SSIM and LPIPS, am I right for the following understanding? Firstly: Step1: Reset the test_Pairs.txt file from [000001_0.jpg 001744_1.jpg 000010_0.jpg 004325_1.jpg ...] to [000001_0.jpg 000001_1.jpg 000010_0.jpg 000010_1.jpg ...] Step2: Run testGMM to get warped clothing
Step3: Run testTOM to get the try-on results that the input Person(Now, we get try-on result.)
Step4: Compute SSIM and LPIPS between original person image and new try-on result.(Here we see the original person image as ground truth)
What's more, about the calculation of IoU, I am also a little confused. Within the paper of cp-vton-plus, there is only one sentence "the parsed segmentation area for the current upper clothing is used as the IoU reference". I guess maybe I should calculate IoU between parsed segmentation clothing area( warped clothing ) and original target clothing?
It would be great thanks if you can shed some light on those problems, and also it would be better if you could provide the overall evaluation tools and instruction documents.
I think that is correct for SSIM and LPIPS. I tried this setting for SSIM in another work and I could replicate the results. For IoU I can't help you yet as I'm not using it, but I will update in case I try it.
I think that is correct for SSIM and LPIPS. I tried this setting for SSIM in another work and I could replicate the results. For IoU I can't help you yet as I'm not using it, but I will update in case I try it.
Hi @Gogo693, follow this way now I can nearly replicate the results in the paper. Great thanks for the useful feedback!
Hi, thank you everyone for the great discussion. I made a minor update to the repo to help you with the evaluation.
test_pairs_same.txt
is added to the data folder. So, for the quantitative evaluation, you can simply uncomment the line https://github.com/minar09/cp-vton-plus/blob/master/test.py#L36 and comment the previous one.
For IoU evaluation, warped clothing mask/silhouette and the target-clothing-mask-on-person from the ground truth image (segmentation) are used, then you can calculate the metric by doing intersection and union between them.
Hope these help. Thank you.
Thank you Minar for being very clear and available. If I may ask the last confirm on one subject to be sure: can you confirm IS score is computed on the 'unpaired' generated images?
Yes, in CP-VTON+, IS is evaluated on unpaired test cases, meaning target cloth is different from the source human outfit. Evaluation was run on the test_pairs.txt list.
@minar09 can you share your IoU evaluation source code?
Hi @minar09 , since IoU metric with matlab code is hard to understand for me in here, is that okay for me to use jaccard_similarity_score replacing the IoU metric in your paper for evaluating the performance of GMM?
@Ha0Tang , sorry for my late reply, it's hard for me to find time to maintain the repositories nowadays, so I am adding a snapshot of my IoU evaluation code here. Hope this helps. Thank you.
@Amazingren , sure Jaccard index should work as well. Sorry for my late reply. Thank you for your understanding. Have a nice day.
I wanted to know how did you calculate Warped Blended (IoU), SSIM, LPIPS, IS metrics for quantitative evaluation and where exactly did you implement in the code for quantitative evaluation?