About feature normalization

yanconglin / Deep-Hough-Transform-Line-Priors

Official implementation for Deep-Hough-Transform-Line-Priors (ECCV 2020)

MIT License

161 stars 31 forks source link

About feature normalization #4

Closed fuy34 closed 3 years ago

fuy34 commented 3 years ago

Hi,

Thank you for sharing the code. I have 2 questions about the code.

(1) In https://github.com/yanconglin/Deep-Hough-Transform-Line-Priors/blob/812d8c21e98e7b11b2e81b88faad8c8709cbd364/ht-lcnn/lcnn/models/HT.py#L93 you normalize the feature with the number of columns. May I ask what the motivation behind it?

(2) In https://github.com/yanconglin/Deep-Hough-Transform-Line-Priors/blob/812d8c21e98e7b11b2e81b88faad8c8709cbd364/ht-lcnn/lcnn/models/HT.py#L36 you initialize the rho as the total length as the image diagonal and the nrho is the double size of it. However, in the later part in this function, you shift the coordinate into the center of image. In this case, isn't rho the half size of the diagonal? I know there is nothing wrong to have a larger range of rho, but I am a little confused if it is necessary.

Thanks in advance.

yanconglin commented 3 years ago

Hi, fuy34,

(1) HT is an accumulation process, where a HT bin obtains votes from a number of pixles. Say, given an binary image, the output intensity of HT bin varies from 0 to ht_max, where ht_max indicates the maximla number of votes (in the ideal case, it is equal to the diagonal). Howevr, CNNs filters/values are often normalized between -1 to 1. Thus, I need this normalization to rescale HT bins. In an image, the diagonal is often composed of max(row, cols) pixels, due to quanztization effect. That is why I pick up this normalization. You can also try normalization by the maximal vote, this would scale the HT bins to 0-1. Another solution would be batch normalization, which seems to hamper the performance (so far I still do not know why ...).

(2) You are right. This is the initial version I usedin my exps. You can simply change this line into: D = np.sqrt((rows /2) 2 + (cols/2) 2) . This solves your concern. https://github.com/yanconglin/Deep-Hough-Transform-Line-Priors/blob/812d8c21e98e7b11b2e81b88faad8c8709cbd364/ht-lcnn/lcnn/models/HT.py#L33

Let me know if you have other questions.

fuy34 commented 3 years ago

Thank you for the explanation.

I have one more question.

By reading the HT, IHT, and HTIHT function, if I understand correctly, it seems you do feat @ vote_indx to get the HT_feat. And then, do HT_feat @ vote_index.t() to get IHT_feat to combine with feat.

However, given one line in the image for example, I would imagine there would be multiple bins being activated (e.g. Fig. 3b in the paper). If so, once we do HT_feat @ vote_index.t(), all pixels along the lines will get values from those bins. How could you get a clear single line (Fig. 3c), which is similar to the original line (Fig.3a) from them?

Thanks!

yanconglin commented 3 years ago

If I understand correctly, you are asking the why fig3c only contains one single line instead fo multiple blurry lines, given that multiple bins have been activated.

Without CONVS in HT domain, you are correct. It is impossible to get a clean line by simply applying HT-IHT with out filtering. That is why 1D CONVS are applied in HT space to find those locallly maximal bins which represent lines. This is similar to the Radon and inverse Radon transform, where RAMP filters are applied to better reconstruct the original input (to deblur). Then probably you may wonder: why those 1D filters, specially? You can find the answer in the Radon literature. There are some examples showing the effect of RAMP filters (with and without) .

Hope this answers your question.

fuy34 commented 3 years ago

I see. Thank you! Expecting your CUDA implementation of the HT and IHT module. I will close the issue. Merry Chrismas and Happy New Year! :)