yizt / keras-ctpn

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...
Apache License 2.0
107 stars 38 forks source link

side improvement isn't side-refinement ? #3

Open hcnhatnam opened 5 years ago

hcnhatnam commented 5 years ago

I think side improvement isn't side-refinement in paper

yizt commented 5 years ago

@hcnhatnam 感谢指正,已更正;翻译为"侧边细化"

hcnhatnam commented 5 years ago

I asked but not yet answered, did you implement side-refinement like that? Screenshot from 2019-04-08 11-23-57

yizt commented 5 years ago

@hcnhatnam 我的实现逻辑不是这样的;Ground Truth本身也做了分割;如果一个GT的x轴坐标[x1,x2]分别为[5.3,68.7],则会被分割(spilt)为如下5个GT: [5.3,16.] 、[16.,32.]、[32.,48.]、[48.,64]、 [64.,68.7] ;对于匹配中间3个GT的anchors,side-refinement 回归目标为0;只有匹配左右两边的gt才有side-refinement 回归目标;分别为 dx= ((5.3+16)-(0+16))/16; 对于匹配[5.3,16.]的anchors dx= ((64.+68.7)-(64+72))/16; 对于匹配[64.,68.7]的anchors

hcnhatnam commented 5 years ago

sorry but Ground Truth is[x1,y1,x2,y2] why Ground Truth is [5.3,68.7]?i don't understand

yizt commented 5 years ago

@hcnhatnam 对, Ground Truth是四边形,坐标为[lt_x, lt_y, rt_x, rt_y, rb_x, rb_y, lb_x, lb_y];side-refinement只与x轴坐标相关,所以省略了y轴坐标

hcnhatnam commented 5 years ago

i understaned.I really appreciate you.But i think dx= ((64.+68.7)-(64+70))/16 not 72

yizt commented 5 years ago

@hcnhatnam 应该是dx= ((64.+68.7)-(64+80))/16 ;(* ̄︶ ̄)

hcnhatnam commented 5 years ago

ohh... ok ok

hcnhatnam commented 5 years ago

@yizt I think you were a bit confused. Screenshot from 2019-04-08 21-07-38 In the paper: we are considering O*

yizt commented 5 years ago

@hcnhatnam 你说的没错;这里的实现不是完全按照论文中的逻辑。 个人理解:论文中说x_side是预测与当前anchor最近的水平边x坐标;本身是比较模糊的,最邻近边可能是左边,也可能是右边;逻辑较为复杂,也比较绕。所以我按照中心点回归的思想,直接将anchor的中心向GT的中心方向偏移;逻辑更简单,更一致;偏移的距离(anchor_cx-gt_cx) 2; anchor_cx - gt_cx是中心点偏移的距离, (anchor_cx-gt_cx) 2就是anchor移动到与gt重合的距离。最终尺度不变的回归目标就是 dx=(anchor_cx-gt_cx) * 2/w 恒等于 ((anchor_x1+anchor_x2) - (gt_x1 + gt_x2))/w

NamNguyenThanh commented 5 years ago

Hi @Yizt, I understood what you implemented for side-refinement. But in your result on ICDAR 2015, I think that not only effect on the head and tail anchors of text line ground truth (refine < 16 pixels) but also more than 16 pixels (ex: below picture) 56456128_1988716384756644_1062434923161321472_n

yizt commented 5 years ago

@NamNguyenThanh 感谢您的反馈!有两个方面原因: a) 虽然x坐标真正的偏移应该在(-16,16); 训练样本的回归目标都是这样的,所以理论上超出16个像素的概率应该很小。但是网络并没有增加明确约束限制在16个像素内;所以预测时有可能超出16个像素。 b) 网络的输入是720*720; 这里可视化使用pyplot保存后图像是1600*1600; 宽度16也是对于720*720来说的, 所以例子中图像偏移应该也没有超过16