How to do quantitative evaluation?

dopu2k16 commented 4 years ago

I wanted to know how did you calculate Warped Blended (IoU), SSIM, LPIPS, IS metrics for quantitative evaluation and where exactly did you implement in the code for quantitative evaluation?

Bouncer51 commented 4 years ago

Hey, did u get any way to measure LPIPS?. I learned that IS cannot be used as proper evaluation metric for this task . We could only do SSIM,IOU as per now.

minar09 commented 3 years ago

quantitative evaluation codes are not included in this repository. We used the official codebase for these metrics. Please refer to the metrics' official codebases.

Gogo693 commented 3 years ago

Hello! I encountered many problems with metrics in this field too and I can't reproduce the same score. Did you use skimage library for metrics? Can you provide the code so that I can check if I am using it correctly?

Ha0Tang commented 3 years ago

+1

minar09 commented 3 years ago

We tried to use the official source codes for the quantitative evaluation as much as possible. Here are the repositories used for evaluation as listed below:

1) IoU: (Implemented in MatLab according to FCN paper) 2) SSIM: https://www.mathworks.com/help/images/ref/ssim.html 3) LPIPS (AlexNet): https://github.com/richzhang/PerceptualSimilarity 4) IS: https://github.com/sundyCoder/IS_MS_SS/blob/master/is/IS.py

Gogo693 commented 3 years ago

I will try those, thank you so much for the help!

Ha0Tang commented 3 years ago

Hi, @Gogo693 where is the target human image to calculate SSIM and LPIPS? and where is the real parsed segmentation map to calculate mIoU? Thanks.

Gogo693 commented 3 years ago

Hi, I only tried to calculate SSIM and I used the original person image in the test dataset as target human image, even if it has a different garment. I can't be sure it is the best solution but I don't know how to do it otherwise, maybe Minar can help us. For what concerns LPIPS and mIoU I don't know if I am going to use those (I did not read how they work yet), but I will update if I discover something.

Ha0Tang commented 3 years ago

I only tried to calculate SSIM and I used the original person image in the test dataset as target human image, even if it has a different garment. I can't be sure it is the best solution but I don't know how to do it otherwise

@Gogo693 I think this is not the correct way to calculate SSIM and LPIPS, @minar09 can you help us?

minar09 commented 3 years ago

Hi, except the inception score, for all of the metrics evaluation, you need to generate test results on the paired setting, which means input pairs with same clothes so that you can use the person image as the ground truth. Please see our CP-VTON+ or CloTH-VTON paper for these details. Different clothed pairs are for visualization comparison only.

Ha0Tang commented 3 years ago

@minar09 thanks, where can I find the details for the paired setting?

minar09 commented 3 years ago

This paired setting means the same setting as training, same id for both cloth and person of a pair input.

Gogo693 commented 3 years ago

So it is paired for SSIM (or when we need comparison) and unpaired for IS as in shown results. Thank you, I did not understand this was the standard for evaluating try-on but now it is clear!

Ha0Tang commented 3 years ago

I know the difference between the two, but still don't know how to evaluate it, if someone can provide more instructions, that would be great.

Gogo693 commented 3 years ago

This is what I understood. When running test you run 2 experiments: 1) You use as input Person P_i matched with Cloth C_i (paired input like in training) to generate G_i_i, then you evaluate SSIM comparing G_i_i with the original image P_i. 2) You use as input Person P_i matched with Cloth C_j (unpaired match i != j) to generate G_i_j, then you evaluate IS on the generated images without comparison as it is not needed by the IS score. For the unpaired matches there should be a text file in the dataset.

I don't know if this was your doubt, but I hope it is useful.

Amazingren commented 3 years ago

Hi, @minar09 @Gogo693, Sorry for bothering you with this overtasked problem. About the evaluation for SSIM and LPIPS, am I right for the following understanding? Firstly: Step1: Reset the test_Pairs.txt file from [000001_0.jpg 001744_1.jpg 000010_0.jpg 004325_1.jpg ...] to [000001_0.jpg 000001_1.jpg 000010_0.jpg 000010_1.jpg ...] Step2: Run testGMM to get warped clothing

Step3: Run testTOM to get the try-on results that the input Person(Now, we get try-on result.)

Step4: Compute SSIM and LPIPS between original person image and new try-on result.(Here we see the original person image as ground truth)

What's more, about the calculation of IoU, I am also a little confused. Within the paper of cp-vton-plus, there is only one sentence "the parsed segmentation area for the current upper clothing is used as the IoU reference". I guess maybe I should calculate IoU between parsed segmentation clothing area( warped clothing ) and original target clothing?

It would be great thanks if you can shed some light on those problems, and also it would be better if you could provide the overall evaluation tools and instruction documents.

Gogo693 commented 3 years ago

I think that is correct for SSIM and LPIPS. I tried this setting for SSIM in another work and I could replicate the results. For IoU I can't help you yet as I'm not using it, but I will update in case I try it.

Amazingren commented 3 years ago

I think that is correct for SSIM and LPIPS. I tried this setting for SSIM in another work and I could replicate the results. For IoU I can't help you yet as I'm not using it, but I will update in case I try it.

Hi @Gogo693, follow this way now I can nearly replicate the results in the paper. Great thanks for the useful feedback!

minar09 commented 3 years ago

Hi, thank you everyone for the great discussion. I made a minor update to the repo to help you with the evaluation.

test_pairs_same.txt is added to the data folder. So, for the quantitative evaluation, you can simply uncomment the line https://github.com/minar09/cp-vton-plus/blob/master/test.py#L36 and comment the previous one.

For IoU evaluation, warped clothing mask/silhouette and the target-clothing-mask-on-person from the ground truth image (segmentation) are used, then you can calculate the metric by doing intersection and union between them.

Hope these help. Thank you.

Gogo693 commented 3 years ago

Thank you Minar for being very clear and available. If I may ask the last confirm on one subject to be sure: can you confirm IS score is computed on the 'unpaired' generated images?

minar09 commented 3 years ago

Yes, in CP-VTON+, IS is evaluated on unpaired test cases, meaning target cloth is different from the source human outfit. Evaluation was run on the test_pairs.txt list.

Ha0Tang commented 3 years ago

@minar09 can you share your IoU evaluation source code?

Amazingren commented 3 years ago

Hi @minar09 , since IoU metric with matlab code is hard to understand for me in here, is that okay for me to use jaccard_similarity_score replacing the IoU metric in your paper for evaluating the performance of GMM?

minar09 commented 3 years ago

@Ha0Tang , sorry for my late reply, it's hard for me to find time to maintain the repositories nowadays, so I am adding a snapshot of my IoU evaluation code here. Hope this helps. Thank you.

@Amazingren , sure Jaccard index should work as well. Sorry for my late reply. Thank you for your understanding. Have a nice day.

minar09 / cp-vton-plus

How to do quantitative evaluation? #28