tonghe90 / textspotter

324 stars 112 forks source link

issues about gen_gts_layer #42

Open chunhui999 opened 5 years ago

chunhui999 commented 5 years ago

Q1:in train.pt ,"gt_bbox" is noted by ” N * 8 ### grounding truth boxes for text (for computing loss)” but in Class gen_gts_layer which in tool_layers.py it is noted by "bottom[0]: gt_label [N,1,sz,sz]" What does gt_bbox mean? Q2:Could you please provide an intuitive explanation of what the following variables are ?
'sample_gt_cont' 'sample_gt_label_input' 'sample_gt_label_output'

chunhui999 commented 5 years ago

@tonghe90 I have another question. If I use ICDAR2015, how to generate the data about "mask_gt" and "mask_iou_angle". Looking forward to your reply.

chunhui999 commented 5 years ago

@tonghe90 看完代码我发现文本识别部分的文本标签也包含在gt_bbox中。对于某个gt_bbox,其前8个元素表示bbox坐标,第9个元素表示文本标签的长度,从10开始的label_len个元素表示文本标签,这里完整的标签文本分为单个的元素,其类型是什么,是如何转换的? layer { name: 'iou_maps_angles' type: 'Python' bottom: 'gt_bbox' top: 'rois' top: 'sample_gt_cont' top: 'sample_gt_label_input' top: "sample_gt_label_output" ...... }

crazysal commented 5 years ago

mask_gt is generated only for dataset having character level annotation : Synthtext. Check section 2.3 of paper for training strategy.

mask_iou_angle is generated from output of East proposals in case of rbox (rotated rectangle bounding box) - Output of east is distances of pixel from sides of quadrilateral and angle in 5 channels.

sample_gt_cont is vector of shape of gt labels having zeroes and ones, used for continuity of hidden state of lstm : multiply 0 to hidden state, when start of predict new box, rest values 1.

sample_gt_label_input : one hot encoding or character embedding of each label from groundtruth - shape also used to pad max length of sequence when less than 25 .

sample_gt_label_output : similar as above but for during inference time. used to keep track of how many decoder samples to predict as fed into previous input.

Please correct me if i'm wrong ??

chunhui999 commented 5 years ago

@crazysal Thanks for your reply. I think you are right, and it helps me a lot.

chunhui999 commented 5 years ago

@crazysal Could you tell me how to deal with text labels, and what's the format of text label in gt_bbox?

wenston2006 commented 5 years ago

@crazysal 有没有成功复现训练部分的代码,我基于@tonghe的代码尝试复现训练部分的代码,但遇到segmentation fault的问题,

wenston2006 commented 5 years ago

@chunhui999 @crazysal 细看代码发现, 前面8个是坐标,第十个是标签长度, 第九个没用上,不知是不是我弄错了;python 里面元素下标从0开始的,

wenston2006 commented 5 years ago

@crazysal 数据层我修改了@argman的east python数据层, 我把loss_4s和iou_loss都注释掉了,只训练文字识别的softmaxloss; 但不知为何出现内存溢出的问题;不知你的数据层用什么代码编写的;不知你的数据层怎么编写的? 在@tonghe给的代码基础上,加上自己的数据层和iou_loss层是否就可以成功训练了?

chunhui999 commented 5 years ago

@wenston2006 下标索引你说的是对的,我之前忽略了这个问题。那么假设忽略第9个元素,其他的前移,那么你的gt_label格式是这样吗?(x1, y1, x2, y2, x3, y3, x4,y4, len, 't', 'e', 'x', 't')

wenston2006 commented 5 years ago

@chunhui999 我的理解是这样的,但我目前训练时遇到内存溢出(segmentation fault)的问题; 目前还不清楚是数据层还是别的层存在问题;

chunhui999 commented 5 years ago

@wenston2006 我也遇到了内存溢出的问题,应该是输入图片尺寸的问题,我把resize尺寸改小了一倍(参照之前测试当中遇到的内存溢出问题),就可以训练了。

ustczhouyu commented 5 years ago

@wenston2006 请问你训练成功了吗?结果怎么样?

ZDDEAN commented 5 years ago

请问如何能分享一下synthtext格式转换为icdar格式的脚本吗,谢谢鸭