side improvement isn't side-refinement ? - Githubissues

yizt / keras-ctpn

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Apache License 2.0

107 stars 38 forks source link

side improvement isn't side-refinement ? #3

Open hcnhatnam opened 5 years ago

hcnhatnam commented 5 years ago

I think side improvement isn't side-refinement in paper

yizt commented 5 years ago

@hcnhatnam 感谢指正，已更正；翻译为"侧边细化"

hcnhatnam commented 5 years ago

I asked but not yet answered, did you implement side-refinement like that? Screenshot from 2019-04-08 11-23-57

yizt commented 5 years ago

@hcnhatnam 我的实现逻辑不是这样的;Ground Truth本身也做了分割;如果一个GT的x轴坐标[x1,x2]分别为[5.3,68.7],则会被分割(spilt)为如下5个GT: [5.3,16.] 、[16.,32.]、[32.,48.]、[48.,64]、 [64.,68.7] ;对于匹配中间3个GT的anchors，side-refinement 回归目标为0；只有匹配左右两边的gt才有side-refinement 回归目标；分别为 dx= ((5.3+16)-(0+16))/16; 对于匹配[5.3,16.]的anchors dx= ((64.+68.7)-(64+72))/16; 对于匹配[64.,68.7]的anchors

hcnhatnam commented 5 years ago

sorry but Ground Truth is[x1,y1,x2,y2] why Ground Truth is [5.3,68.7]?i don't understand

yizt commented 5 years ago

@hcnhatnam 对， Ground Truth是四边形,坐标为[lt_x, lt_y, rt_x, rt_y, rb_x, rb_y, lb_x, lb_y]；side-refinement只与x轴坐标相关，所以省略了y轴坐标

hcnhatnam commented 5 years ago

i understaned.I really appreciate you.But i think dx= ((64.+68.7)-(64+70))/16 not 72

yizt commented 5 years ago

@hcnhatnam 应该是dx= ((64.+68.7)-(64+80))/16 ；(*￣︶￣)

hcnhatnam commented 5 years ago

ohh... ok ok

hcnhatnam commented 5 years ago

@yizt I think you were a bit confused. Screenshot from 2019-04-08 21-07-38 In the paper: we are considering O*

dx(of first anchor)=O*=(5.3-8)/16 ;(8=(0+16)/2=Cax= center of anchor in x-axis)
dx(of last anchor)=O*=(68.7-56)/16 ;(72=(64-80)/2=Cax= center of anchor in x-axis)

yizt commented 5 years ago

@hcnhatnam 你说的没错;这里的实现不是完全按照论文中的逻辑。个人理解：论文中说x_side是预测与当前anchor最近的水平边x坐标；本身是比较模糊的，最邻近边可能是左边，也可能是右边；逻辑较为复杂，也比较绕。所以我按照中心点回归的思想，直接将anchor的中心向GT的中心方向偏移;逻辑更简单,更一致；偏移的距离(anchor_cx-gt_cx) 2; anchor_cx - gt_cx是中心点偏移的距离, (anchor_cx-gt_cx) 2就是anchor移动到与gt重合的距离。最终尺度不变的回归目标就是 dx=(anchor_cx-gt_cx) * 2/w 恒等于 ((anchor_x1+anchor_x2) - (gt_x1 + gt_x2))/w

NamNguyenThanh commented 5 years ago

Hi @Yizt, I understood what you implemented for side-refinement. But in your result on ICDAR 2015, I think that not only effect on the head and tail anchors of text line ground truth (refine < 16 pixels) but also more than 16 pixels (ex: below picture) 56456128_1988716384756644_1062434923161321472_n

yizt commented 5 years ago

@NamNguyenThanh 感谢您的反馈！有两个方面原因： a) 虽然x坐标真正的偏移应该在(-16,16); 训练样本的回归目标都是这样的，所以理论上超出16个像素的概率应该很小。但是网络并没有增加明确约束限制在16个像素内；所以预测时有可能超出16个像素。 b) 网络的输入是720*720; 这里可视化使用pyplot保存后图像是1600*1600; 宽度16也是对于720*720来说的，所以例子中图像偏移应该也没有超过16