nie-lang / DeepRectangling

CVPR2022 (Oral) - Deep Rectangling for Image Stitching: A Learning Baseline
233 stars 38 forks source link

Questions about Inference #21

Closed SoninVision closed 1 year ago

SoninVision commented 1 year ago

First of all, thank you for your fantastic work!

There are some questions I would like to ask you while executing your code.

How did you get the binary mask of the stitched image in the test process that you marked as an 'inference' step on the code? In this study, I understood well how you constructed the training dataset.

So, when performing 'cross-dataset evaluation' in the paper, did you perform ELA-based stitching, convert the resulting image into a square shape through the Kaiming He et al.'s method, and convert the all-one matrix using the inverse of the transformation matrix to obtain a binary mask of inference input?

According to the Q&A of issues #9 and #12, apart from the method of constructing the training dataset, did the inference process use UDIS-based stitching image and its binary mask because the main contribution in this study is to Rectangling an image that has already been stitched. Am I right?

The method, of converting ELA-based stitching -> He's Rectangling -> warp all-one matrix to obtain the same form of a binary mask as the stitching result is very burdensome and impractical. So I also think you would of course have chosen the latter rather than the former, but this is not stated, so I'm asking you directly.

Thank you!

nie-lang commented 1 year ago

[Answer 1]. The binary masks in the test process can be obtained by the stitching algorithm (replace the ref/tar imgs with all-one matrixes) or by segmenting the stitched images through a threshold. [Answer 2]. In the cross-dataset evaluation, I got the binary masks as [Answer 1] describes. [Answer 3]. The inference process does not include the stitching step. Rectangling an image that has already been stitched is what we do. [Answer 4]. We use "ELA-based stitching -> He's Rectangling -> inverse warp -> warp rectangular images/all-one matrix to irregular shapes" to generate the dataset. It's burdensome but can provide perfect GT without distortion. In the inference, rectangling is the only thing to do and we do not care about the stitching algorithms anymore.

SoninVision commented 1 year ago

I get it. So, the binary mask was assumed to have somehow been obtained, whether using the transformation matrix obtained in the stitching step or based on thresholding.

And you acquired binary masks through stitching methods on the 'cross-dataset evaluation' stage, such as SPW, LCP, and UDIS. Is my understanding correct?

Thank you for your kind reply!

nie-lang commented 1 year ago

You are right!

By the way, the binary mask and boundary loss are not very necessary because the rectangular labels have provided effective boundary supervision.

SoninVision commented 1 year ago

That's what I was wondering! Thank you for sharing your insight.

It was an honor to hear your thoughts. Thank you again!